@mseep/clawdcursor 1.5.5
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CHANGELOG.md +2264 -0
- package/LICENSE +21 -0
- package/README.md +385 -0
- package/SECURITY.md +44 -0
- package/SKILL.md +503 -0
- package/dist/core/agent-loop/agent.d.ts +42 -0
- package/dist/core/agent-loop/agent.js +1023 -0
- package/dist/core/agent-loop/agent.js.map +1 -0
- package/dist/core/agent-loop/batch-tool.d.ts +25 -0
- package/dist/core/agent-loop/batch-tool.js +218 -0
- package/dist/core/agent-loop/batch-tool.js.map +1 -0
- package/dist/core/agent-loop/coord-scale.d.ts +72 -0
- package/dist/core/agent-loop/coord-scale.js +89 -0
- package/dist/core/agent-loop/coord-scale.js.map +1 -0
- package/dist/core/agent-loop/focus-guard.d.ts +24 -0
- package/dist/core/agent-loop/focus-guard.js +29 -0
- package/dist/core/agent-loop/focus-guard.js.map +1 -0
- package/dist/core/agent-loop/project-mcp.d.ts +97 -0
- package/dist/core/agent-loop/project-mcp.js +253 -0
- package/dist/core/agent-loop/project-mcp.js.map +1 -0
- package/dist/core/agent-loop/prompt.d.ts +45 -0
- package/dist/core/agent-loop/prompt.js +426 -0
- package/dist/core/agent-loop/prompt.js.map +1 -0
- package/dist/core/agent-loop/tool-meta.d.ts +93 -0
- package/dist/core/agent-loop/tool-meta.js +651 -0
- package/dist/core/agent-loop/tool-meta.js.map +1 -0
- package/dist/core/agent-loop/tools.d.ts +38 -0
- package/dist/core/agent-loop/tools.js +2134 -0
- package/dist/core/agent-loop/tools.js.map +1 -0
- package/dist/core/agent-loop/types.d.ts +170 -0
- package/dist/core/agent-loop/types.js +12 -0
- package/dist/core/agent-loop/types.js.map +1 -0
- package/dist/core/agent.d.ts +51 -0
- package/dist/core/agent.js +245 -0
- package/dist/core/agent.js.map +1 -0
- package/dist/core/app-categories.d.ts +67 -0
- package/dist/core/app-categories.js +108 -0
- package/dist/core/app-categories.js.map +1 -0
- package/dist/core/banner.d.ts +70 -0
- package/dist/core/banner.js +245 -0
- package/dist/core/banner.js.map +1 -0
- package/dist/core/classify/capability.d.ts +45 -0
- package/dist/core/classify/capability.js +78 -0
- package/dist/core/classify/capability.js.map +1 -0
- package/dist/core/decompose/llm-decomposer.d.ts +35 -0
- package/dist/core/decompose/llm-decomposer.js +156 -0
- package/dist/core/decompose/llm-decomposer.js.map +1 -0
- package/dist/core/decompose/parser.d.ts +27 -0
- package/dist/core/decompose/parser.js +101 -0
- package/dist/core/decompose/parser.js.map +1 -0
- package/dist/core/observability/correlation.d.ts +19 -0
- package/dist/core/observability/correlation.js +36 -0
- package/dist/core/observability/correlation.js.map +1 -0
- package/dist/core/observability/cost-meter.d.ts +51 -0
- package/dist/core/observability/cost-meter.js +134 -0
- package/dist/core/observability/cost-meter.js.map +1 -0
- package/dist/core/observability/logger.d.ts +61 -0
- package/dist/core/observability/logger.js +550 -0
- package/dist/core/observability/logger.js.map +1 -0
- package/dist/core/router/aliases.d.ts +50 -0
- package/dist/core/router/aliases.js +104 -0
- package/dist/core/router/aliases.js.map +1 -0
- package/dist/core/router/normalize.d.ts +41 -0
- package/dist/core/router/normalize.js +80 -0
- package/dist/core/router/normalize.js.map +1 -0
- package/dist/core/safety.d.ts +126 -0
- package/dist/core/safety.js +568 -0
- package/dist/core/safety.js.map +1 -0
- package/dist/core/sense/a11y-resolver.d.ts +73 -0
- package/dist/core/sense/a11y-resolver.js +76 -0
- package/dist/core/sense/a11y-resolver.js.map +1 -0
- package/dist/core/sense/fingerprint.d.ts +41 -0
- package/dist/core/sense/fingerprint.js +123 -0
- package/dist/core/sense/fingerprint.js.map +1 -0
- package/dist/core/sense/rank.d.ts +70 -0
- package/dist/core/sense/rank.js +192 -0
- package/dist/core/sense/rank.js.map +1 -0
- package/dist/core/sense/reactive-check.d.ts +40 -0
- package/dist/core/sense/reactive-check.js +48 -0
- package/dist/core/sense/reactive-check.js.map +1 -0
- package/dist/core/sense/snapshot.d.ts +19 -0
- package/dist/core/sense/snapshot.js +100 -0
- package/dist/core/sense/snapshot.js.map +1 -0
- package/dist/core/sense/types.d.ts +66 -0
- package/dist/core/sense/types.js +9 -0
- package/dist/core/sense/types.js.map +1 -0
- package/dist/core/sense/ui-map-anchors.d.ts +7 -0
- package/dist/core/sense/ui-map-anchors.js +24 -0
- package/dist/core/sense/ui-map-anchors.js.map +1 -0
- package/dist/core/sense/ui-map-elements.d.ts +5 -0
- package/dist/core/sense/ui-map-elements.js +33 -0
- package/dist/core/sense/ui-map-elements.js.map +1 -0
- package/dist/core/sense/ui-map-find.d.ts +56 -0
- package/dist/core/sense/ui-map-find.js +153 -0
- package/dist/core/sense/ui-map-find.js.map +1 -0
- package/dist/core/sense/ui-map-fuse.d.ts +4 -0
- package/dist/core/sense/ui-map-fuse.js +44 -0
- package/dist/core/sense/ui-map-fuse.js.map +1 -0
- package/dist/core/sense/ui-map-geom.d.ts +3 -0
- package/dist/core/sense/ui-map-geom.js +16 -0
- package/dist/core/sense/ui-map-geom.js.map +1 -0
- package/dist/core/sense/ui-map-holder.d.ts +58 -0
- package/dist/core/sense/ui-map-holder.js +87 -0
- package/dist/core/sense/ui-map-holder.js.map +1 -0
- package/dist/core/sense/ui-map-normalize.d.ts +19 -0
- package/dist/core/sense/ui-map-normalize.js +65 -0
- package/dist/core/sense/ui-map-normalize.js.map +1 -0
- package/dist/core/sense/ui-map-render.d.ts +4 -0
- package/dist/core/sense/ui-map-render.js +34 -0
- package/dist/core/sense/ui-map-render.js.map +1 -0
- package/dist/core/sense/ui-map-resolve.d.ts +41 -0
- package/dist/core/sense/ui-map-resolve.js +59 -0
- package/dist/core/sense/ui-map-resolve.js.map +1 -0
- package/dist/core/sense/ui-map-types.d.ts +66 -0
- package/dist/core/sense/ui-map-types.js +11 -0
- package/dist/core/sense/ui-map-types.js.map +1 -0
- package/dist/core/sense/ui-map.d.ts +29 -0
- package/dist/core/sense/ui-map.js +113 -0
- package/dist/core/sense/ui-map.js.map +1 -0
- package/dist/core/verify/assertions.d.ts +132 -0
- package/dist/core/verify/assertions.js +284 -0
- package/dist/core/verify/assertions.js.map +1 -0
- package/dist/index.d.ts +21 -0
- package/dist/index.js +24 -0
- package/dist/index.js.map +1 -0
- package/dist/llm/browser-config.d.ts +36 -0
- package/dist/llm/browser-config.js +83 -0
- package/dist/llm/browser-config.js.map +1 -0
- package/dist/llm/client.d.ts +268 -0
- package/dist/llm/client.js +1094 -0
- package/dist/llm/client.js.map +1 -0
- package/dist/llm/config.d.ts +79 -0
- package/dist/llm/config.js +375 -0
- package/dist/llm/config.js.map +1 -0
- package/dist/llm/credentials.d.ts +35 -0
- package/dist/llm/credentials.js +491 -0
- package/dist/llm/credentials.js.map +1 -0
- package/dist/llm/external-creds.d.ts +42 -0
- package/dist/llm/external-creds.js +169 -0
- package/dist/llm/external-creds.js.map +1 -0
- package/dist/llm/providers.d.ts +123 -0
- package/dist/llm/providers.js +717 -0
- package/dist/llm/providers.js.map +1 -0
- package/dist/paths.d.ts +31 -0
- package/dist/paths.js +147 -0
- package/dist/paths.js.map +1 -0
- package/dist/platform/accessibility.d.ts +139 -0
- package/dist/platform/accessibility.js +670 -0
- package/dist/platform/accessibility.js.map +1 -0
- package/dist/platform/cdp-driver.d.ts +318 -0
- package/dist/platform/cdp-driver.js +1179 -0
- package/dist/platform/cdp-driver.js.map +1 -0
- package/dist/platform/index.d.ts +11 -0
- package/dist/platform/index.js +69 -0
- package/dist/platform/index.js.map +1 -0
- package/dist/platform/keys.d.ts +17 -0
- package/dist/platform/keys.js +129 -0
- package/dist/platform/keys.js.map +1 -0
- package/dist/platform/launch-poll.d.ts +101 -0
- package/dist/platform/launch-poll.js +177 -0
- package/dist/platform/launch-poll.js.map +1 -0
- package/dist/platform/linux.d.ts +173 -0
- package/dist/platform/linux.js +1253 -0
- package/dist/platform/linux.js.map +1 -0
- package/dist/platform/macos.d.ts +136 -0
- package/dist/platform/macos.js +976 -0
- package/dist/platform/macos.js.map +1 -0
- package/dist/platform/native-desktop.d.ts +145 -0
- package/dist/platform/native-desktop.js +936 -0
- package/dist/platform/native-desktop.js.map +1 -0
- package/dist/platform/native-helper.d.ts +130 -0
- package/dist/platform/native-helper.js +592 -0
- package/dist/platform/native-helper.js.map +1 -0
- package/dist/platform/ocr-engine.d.ts +78 -0
- package/dist/platform/ocr-engine.js +363 -0
- package/dist/platform/ocr-engine.js.map +1 -0
- package/dist/platform/ps-runner.d.ts +28 -0
- package/dist/platform/ps-runner.js +228 -0
- package/dist/platform/ps-runner.js.map +1 -0
- package/dist/platform/types.d.ts +397 -0
- package/dist/platform/types.js +15 -0
- package/dist/platform/types.js.map +1 -0
- package/dist/platform/uri-handler.d.ts +75 -0
- package/dist/platform/uri-handler.js +273 -0
- package/dist/platform/uri-handler.js.map +1 -0
- package/dist/platform/wayland-backend.d.ts +53 -0
- package/dist/platform/wayland-backend.js +348 -0
- package/dist/platform/wayland-backend.js.map +1 -0
- package/dist/platform/windows.d.ts +232 -0
- package/dist/platform/windows.js +1210 -0
- package/dist/platform/windows.js.map +1 -0
- package/dist/postbuild.d.ts +10 -0
- package/dist/postbuild.js +98 -0
- package/dist/postbuild.js.map +1 -0
- package/dist/schema/snapshot.d.ts +33 -0
- package/dist/schema/snapshot.js +90 -0
- package/dist/schema/snapshot.js.map +1 -0
- package/dist/shortcuts.d.ts +30 -0
- package/dist/shortcuts.js +261 -0
- package/dist/shortcuts.js.map +1 -0
- package/dist/surface/cli.d.ts +7 -0
- package/dist/surface/cli.js +1556 -0
- package/dist/surface/cli.js.map +1 -0
- package/dist/surface/dashboard.d.ts +8 -0
- package/dist/surface/dashboard.js +1193 -0
- package/dist/surface/dashboard.js.map +1 -0
- package/dist/surface/doctor.d.ts +29 -0
- package/dist/surface/doctor.js +1514 -0
- package/dist/surface/doctor.js.map +1 -0
- package/dist/surface/format.d.ts +10 -0
- package/dist/surface/format.js +37 -0
- package/dist/surface/format.js.map +1 -0
- package/dist/surface/http-utility.d.ts +65 -0
- package/dist/surface/http-utility.js +336 -0
- package/dist/surface/http-utility.js.map +1 -0
- package/dist/surface/mcp-server.d.ts +91 -0
- package/dist/surface/mcp-server.js +280 -0
- package/dist/surface/mcp-server.js.map +1 -0
- package/dist/surface/onboarding.d.ts +15 -0
- package/dist/surface/onboarding.js +184 -0
- package/dist/surface/onboarding.js.map +1 -0
- package/dist/surface/pidfile.d.ts +79 -0
- package/dist/surface/pidfile.js +263 -0
- package/dist/surface/pidfile.js.map +1 -0
- package/dist/surface/readiness.d.ts +45 -0
- package/dist/surface/readiness.js +230 -0
- package/dist/surface/readiness.js.map +1 -0
- package/dist/surface/report.d.ts +68 -0
- package/dist/surface/report.js +341 -0
- package/dist/surface/report.js.map +1 -0
- package/dist/surface/skill-register.d.ts +14 -0
- package/dist/surface/skill-register.js +150 -0
- package/dist/surface/skill-register.js.map +1 -0
- package/dist/surface/version.d.ts +6 -0
- package/dist/surface/version.js +27 -0
- package/dist/surface/version.js.map +1 -0
- package/dist/tools/a11y.d.ts +8 -0
- package/dist/tools/a11y.js +545 -0
- package/dist/tools/a11y.js.map +1 -0
- package/dist/tools/a11y_depth.d.ts +19 -0
- package/dist/tools/a11y_depth.js +455 -0
- package/dist/tools/a11y_depth.js.map +1 -0
- package/dist/tools/agent.d.ts +15 -0
- package/dist/tools/agent.js +248 -0
- package/dist/tools/agent.js.map +1 -0
- package/dist/tools/batch.d.ts +46 -0
- package/dist/tools/batch.js +230 -0
- package/dist/tools/batch.js.map +1 -0
- package/dist/tools/cdp.d.ts +8 -0
- package/dist/tools/cdp.js +233 -0
- package/dist/tools/cdp.js.map +1 -0
- package/dist/tools/compact.d.ts +63 -0
- package/dist/tools/compact.js +418 -0
- package/dist/tools/compact.js.map +1 -0
- package/dist/tools/cost-class.d.ts +38 -0
- package/dist/tools/cost-class.js +117 -0
- package/dist/tools/cost-class.js.map +1 -0
- package/dist/tools/desktop.d.ts +9 -0
- package/dist/tools/desktop.js +346 -0
- package/dist/tools/desktop.js.map +1 -0
- package/dist/tools/electron_bridge.d.ts +41 -0
- package/dist/tools/electron_bridge.js +261 -0
- package/dist/tools/electron_bridge.js.map +1 -0
- package/dist/tools/extras.d.ts +22 -0
- package/dist/tools/extras.js +942 -0
- package/dist/tools/extras.js.map +1 -0
- package/dist/tools/favorites.d.ts +13 -0
- package/dist/tools/favorites.js +137 -0
- package/dist/tools/favorites.js.map +1 -0
- package/dist/tools/introspection.d.ts +13 -0
- package/dist/tools/introspection.js +55 -0
- package/dist/tools/introspection.js.map +1 -0
- package/dist/tools/ocr.d.ts +8 -0
- package/dist/tools/ocr.js +66 -0
- package/dist/tools/ocr.js.map +1 -0
- package/dist/tools/orchestration.d.ts +7 -0
- package/dist/tools/orchestration.js +377 -0
- package/dist/tools/orchestration.js.map +1 -0
- package/dist/tools/playbooks/extract-compose.d.ts +22 -0
- package/dist/tools/playbooks/extract-compose.js +85 -0
- package/dist/tools/playbooks/extract-compose.js.map +1 -0
- package/dist/tools/playbooks/find-replace.d.ts +11 -0
- package/dist/tools/playbooks/find-replace.js +56 -0
- package/dist/tools/playbooks/find-replace.js.map +1 -0
- package/dist/tools/playbooks/index.d.ts +63 -0
- package/dist/tools/playbooks/index.js +70 -0
- package/dist/tools/playbooks/index.js.map +1 -0
- package/dist/tools/playbooks/keys-blocklist.d.ts +24 -0
- package/dist/tools/playbooks/keys-blocklist.js +89 -0
- package/dist/tools/playbooks/keys-blocklist.js.map +1 -0
- package/dist/tools/registry.d.ts +40 -0
- package/dist/tools/registry.js +560 -0
- package/dist/tools/registry.js.map +1 -0
- package/dist/tools/safety-gate.d.ts +16 -0
- package/dist/tools/safety-gate.js +70 -0
- package/dist/tools/safety-gate.js.map +1 -0
- package/dist/tools/scheduler.d.ts +76 -0
- package/dist/tools/scheduler.js +413 -0
- package/dist/tools/scheduler.js.map +1 -0
- package/dist/tools/shortcuts.d.ts +13 -0
- package/dist/tools/shortcuts.js +205 -0
- package/dist/tools/shortcuts.js.map +1 -0
- package/dist/tools/smart.d.ts +15 -0
- package/dist/tools/smart.js +785 -0
- package/dist/tools/smart.js.map +1 -0
- package/dist/tools/types.d.ts +174 -0
- package/dist/tools/types.js +67 -0
- package/dist/tools/types.js.map +1 -0
- package/dist/tools/window-text.d.ts +15 -0
- package/dist/tools/window-text.js +39 -0
- package/dist/tools/window-text.js.map +1 -0
- package/dist/types.d.ts +122 -0
- package/dist/types.js +41 -0
- package/dist/types.js.map +1 -0
- package/native/Package.swift +38 -0
- package/native/README.md +113 -0
- package/native/Sources/ClawdCursorHelper/main.swift +602 -0
- package/native/Sources/ClawdCursorHost/main.swift +182 -0
- package/native/Sources/PermissionCheck/main.swift +53 -0
- package/native/Sources/ScreenshotHelper/main.swift +219 -0
- package/native/build.sh +139 -0
- package/native/entitlements.plist +12 -0
- package/package.json +115 -0
- package/scripts/banner.ps1 +112 -0
- package/scripts/coord-accuracy.ps1 +140 -0
- package/scripts/coord-uwp.ps1 +80 -0
- package/scripts/edge-glow.ps1 +180 -0
- package/scripts/find-element.ps1 +198 -0
- package/scripts/get-foreground-window.ps1 +71 -0
- package/scripts/get-screen-context.ps1 +183 -0
- package/scripts/get-windows.ps1 +66 -0
- package/scripts/install-panic-hotkey.ps1 +46 -0
- package/scripts/interact-element.ps1 +431 -0
- package/scripts/invoke-element.ps1 +314 -0
- package/scripts/linux/atspi-bridge.py +356 -0
- package/scripts/linux/ocr-recognize.py +154 -0
- package/scripts/mac/_window-picker.jxa +163 -0
- package/scripts/mac/find-element.jxa +0 -0
- package/scripts/mac/find-element.sh +161 -0
- package/scripts/mac/focus-window.jxa +284 -0
- package/scripts/mac/get-focused-element.jxa +102 -0
- package/scripts/mac/get-foreground-window.jxa +173 -0
- package/scripts/mac/get-screen-context.jxa +197 -0
- package/scripts/mac/get-ui-tree.sh +141 -0
- package/scripts/mac/get-windows.jxa +117 -0
- package/scripts/mac/interact-element.sh +235 -0
- package/scripts/mac/invoke-element.jxa +408 -0
- package/scripts/mac/ocr-recognize.swift +124 -0
- package/scripts/ocr-recognize.ps1 +102 -0
- package/scripts/postinstall-native.js +48 -0
- package/scripts/ps-bridge.ps1 +830 -0
- package/scripts/smoke-mcp.ps1 +119 -0
- package/scripts/sync-version.ts +178 -0
- package/scripts/verify-install.js +81 -0
package/CHANGELOG.md
ADDED
|
@@ -0,0 +1,2264 @@
|
|
|
1
|
+
# Changelog
|
|
2
|
+
|
|
3
|
+
All notable changes to Clawd Cursor will be documented in this file.
|
|
4
|
+
|
|
5
|
+
## [1.5.5] - 2026-06-16 — the skill follows the install (cross-framework)
|
|
6
|
+
|
|
7
|
+
### Fixed
|
|
8
|
+
|
|
9
|
+
- **MCP-direct installs got the tools but not the skill.** The cross-framework
|
|
10
|
+
skill registration (Claude Code, OpenClaw, Codex, Cursor) lived *only* inside
|
|
11
|
+
`clawdcursor doctor` — which the MCP-first onboarding explicitly tells people to
|
|
12
|
+
skip. So an agent connected over MCP saw bare tools with none of the "how to use
|
|
13
|
+
me" knowledge (fallback positioning, the el_NN UI map, sustainable/autonomous
|
|
14
|
+
execution via the daemon + `task`), and clawdcursor stopped appearing as a skill.
|
|
15
|
+
Registration is now extracted into a shared module and runs on **`consent`** (the
|
|
16
|
+
always-required step) and via a new **`clawdcursor register-skill`** command, so
|
|
17
|
+
the skill installs into every detected agent framework regardless of install path.
|
|
18
|
+
|
|
19
|
+
### Changed
|
|
20
|
+
|
|
21
|
+
- **Richer MCP server `instructions`.** Even an agent with no skill file (a host
|
|
22
|
+
that doesn't support skills) now learns the essentials on connect: drive UI
|
|
23
|
+
symbolically (`compile_ui` / `find_button` / `find_field` → `{element_id,
|
|
24
|
+
snapshot_id}`, survives layout shifts), verify with `expect`, the fallback-only
|
|
25
|
+
positioning, and where to find the full guide (the registered skill or
|
|
26
|
+
`clawdcursor.com/llms.txt`).
|
|
27
|
+
|
|
28
|
+
## [1.5.4] - 2026-06-15 — install & distribution hardening
|
|
29
|
+
|
|
30
|
+
### Changed
|
|
31
|
+
|
|
32
|
+
- **Installer is now `npm i -g`, not a git-clone-and-build.** The
|
|
33
|
+
`curl … | bash` / `irm … | iex` one-liners previously cloned the repo and ran
|
|
34
|
+
`npm install` + `npm run build` on the user's machine — requiring git and a
|
|
35
|
+
full build toolchain, and diverging from the `npm i -g clawdcursor` the README
|
|
36
|
+
advertises. They now install the published package globally. macOS still gets
|
|
37
|
+
a working native helper because the package's `postinstall` builds and
|
|
38
|
+
ad-hoc-signs it (ad-hoc is `build.sh`'s default). `VERSION=vX.Y.Z` still pins,
|
|
39
|
+
now via `clawdcursor@X.Y.Z`.
|
|
40
|
+
- **New Claude Code plugin** (`.claude-plugin/plugin.json`) registers the MCP
|
|
41
|
+
server in compact mode — launched via `npx -y clawdcursor` so there's **no
|
|
42
|
+
global install to do first** (npx fetches on demand, or uses a global install
|
|
43
|
+
if present), while still resolving the package `bin` so it survives entry-path
|
|
44
|
+
refactors — and bundles the root `SKILL.md`. A one-step, config-free install
|
|
45
|
+
for Claude Code. Manifest version auto-syncs via `scripts/sync-version.ts`
|
|
46
|
+
(and is guarded by the version-drift test).
|
|
47
|
+
|
|
48
|
+
### Fixed
|
|
49
|
+
|
|
50
|
+
- **Back-compat entry point at `dist/index.js`.** v0.x shipped the CLI there;
|
|
51
|
+
v1.0 moved it to `dist/surface/cli.js`. Hosts that had hard-pinned
|
|
52
|
+
`node <pkg>/dist/index.js …` (e.g. a hand-written MCP entry in Claude Code's
|
|
53
|
+
`.claude.json`) silently broke on a routine `npm i -g clawdcursor` upgrade —
|
|
54
|
+
the MCP server just failed to start with no clear cause. A thin re-export
|
|
55
|
+
shim (`src/index.ts` → `dist/index.js`) now forwards to the real CLI, so those
|
|
56
|
+
pinned paths keep working across the move. New configs should still launch the
|
|
57
|
+
`clawdcursor` bin directly or use the Claude Code plugin, neither of which
|
|
58
|
+
pins a deep dist path.
|
|
59
|
+
- **`uninstall` no longer dead-ends.** It removes the global `clawdcursor`
|
|
60
|
+
command, so `clawdcursor install` can't follow it — and the old success
|
|
61
|
+
message only said how to delete *more*. Uninstall now prints the reinstall
|
|
62
|
+
one-liner (`npm i -g clawdcursor`, plus the OS turnkey installer), so there's
|
|
63
|
+
always an obvious way back.
|
|
64
|
+
|
|
65
|
+
## [1.5.3] - 2026-06-14 — edge-glow indicator + security hardening
|
|
66
|
+
|
|
67
|
+
### Added
|
|
68
|
+
|
|
69
|
+
- **Screen-edge "task in progress" glow.** A full-screen, click-through amber
|
|
70
|
+
glow pulses (dim ↔ bright) on all four screen edges whenever an agent is
|
|
71
|
+
driving the desktop — ambient, at-a-glance awareness that automation is live.
|
|
72
|
+
It rides the same lifecycle as the control-banner pill (shown together,
|
|
73
|
+
hidden together) and never steals focus or intercepts input: a per-pixel-alpha
|
|
74
|
+
layered window with `WS_EX_NOACTIVATE | WS_EX_TRANSPARENT`. Opt out of just the
|
|
75
|
+
glow with `CLAWD_NO_GLOW=1` — the pill (and its double-click-to-stop) stays.
|
|
76
|
+
Windows-only today, like the banner; the API is platform-neutral so
|
|
77
|
+
macOS/Linux overlays can land later. (`scripts/edge-glow.ps1`)
|
|
78
|
+
|
|
79
|
+
### Security / hardening
|
|
80
|
+
|
|
81
|
+
- **Insecure temp files (CWE-377).** The `agent console` terminal scripts were
|
|
82
|
+
written to a predictable `tmpdir/clawdcursor-task-<time>.{ps1,sh}` path and
|
|
83
|
+
then executed; they now use a private `fs.mkdtemp()` directory. The macOS
|
|
84
|
+
screenshot temp moved from `Date.now()` to `crypto.randomUUID()`. A
|
|
85
|
+
source-invariant guard test keeps predictable temp-file names from returning.
|
|
86
|
+
- **Browser user-data dir** used a `/tmp` fallback that is wrong on Windows —
|
|
87
|
+
now `os.tmpdir()`. The unreachable pre-adapter launch fallback gained a
|
|
88
|
+
metacharacter guard so a crafted app name can't escape the PowerShell command.
|
|
89
|
+
- **Code-scanning sweep.** Closed the real CodeQL alerts and documented the
|
|
90
|
+
false positives (the snapshot fingerprint SHA-1 is a non-credential checksum;
|
|
91
|
+
an assertion `fs.open` is read-only). The transitive `file-type` advisory was
|
|
92
|
+
assessed unreachable (the vulnerable ASF path never runs) and dismissed.
|
|
93
|
+
|
|
94
|
+
## [1.5.2] - 2026-06-13 — reliability, honest verification, transparency
|
|
95
|
+
|
|
96
|
+
The theme of this patch is **trust**: the cheap perception path works for
|
|
97
|
+
external agents again, a task can no longer claim success it can't back, and a
|
|
98
|
+
human at the machine always sees (and can stop) automation. Every fix came
|
|
99
|
+
from driving real apps live; all are regression-tested.
|
|
100
|
+
|
|
101
|
+
### Fixed — perception over MCP (the big ones)
|
|
102
|
+
|
|
103
|
+
- **`read_screen` returned an empty tree for *every* app over MCP.** It didn't
|
|
104
|
+
default to the active window's pid, so the accessibility bridge built no
|
|
105
|
+
tree. It now resolves the foreground pid (parity with `find_element`) on
|
|
106
|
+
Windows, macOS, and Linux — the flagship cheap-perception path works again.
|
|
107
|
+
- **Every "element not found" stalled ~20 seconds.** The PowerShell bridge
|
|
108
|
+
emitted nothing for an empty result (the array unrolled to zero objects), so
|
|
109
|
+
the call timed out; a single match also unwrapped to a bare object and was
|
|
110
|
+
dropped. Both fixed — a miss now returns in well under a second.
|
|
111
|
+
- **`open_app` launched apps in the background**, so the next focused-window
|
|
112
|
+
action targeted the wrong window. It now brings the launched window to the
|
|
113
|
+
foreground.
|
|
114
|
+
|
|
115
|
+
### Fixed — honest results (no false success)
|
|
116
|
+
|
|
117
|
+
- **Verification integrity.** A task that changed the screen can no longer be
|
|
118
|
+
marked `done` on evidence that was already true before it acted (an ambient
|
|
119
|
+
clock, an already-open window). New `file_changed_since_start` assertion
|
|
120
|
+
proves a file was actually written during the task.
|
|
121
|
+
- **`open_file` on a folder** no longer reports a bare "Opened" when Explorer
|
|
122
|
+
actually landed on Home — it verifies the folder window opened (and no
|
|
123
|
+
longer falls back to a Start-Menu search that types into the search box).
|
|
124
|
+
- **`open_uri` now opens `ms-settings:` and similar** COM-handler schemes via a
|
|
125
|
+
ShellExecute fallback (they have no launchable executable), instead of
|
|
126
|
+
failing with "no registered handler".
|
|
127
|
+
|
|
128
|
+
### Changed — safety calibration
|
|
129
|
+
|
|
130
|
+
- **Key blocklist is now two-tier.** Genuinely dangerous combos
|
|
131
|
+
(Ctrl+Alt+Del, Win+L, force-quit, shutdown) stay hard-blocked; consequential
|
|
132
|
+
but legitimate ones (Win+D show-desktop, Ctrl+W close-tab, Alt+F4, Win+R…)
|
|
133
|
+
are now **confirm-tier** — usable with approval instead of dead-ended behind
|
|
134
|
+
a message that falsely promised a confirm path.
|
|
135
|
+
- **`minimize_window` no longer asks for confirmation** (tier 1, not 2) — it's
|
|
136
|
+
reversible, and the granular tool now matches the compound `window`
|
|
137
|
+
`{minimize}` surface that already allowed it.
|
|
138
|
+
|
|
139
|
+
### Added — on-screen control banner (transparency)
|
|
140
|
+
|
|
141
|
+
- **"ClawdCursor — desktop control in progress" banner**: a topmost,
|
|
142
|
+
no-focus-steal pill at the top-center of the screen with a blinking red
|
|
143
|
+
recording dot, shown whenever an agent is actively driving the desktop —
|
|
144
|
+
pinned for the whole run of an autonomous task, and activity-triggered
|
|
145
|
+
(auto-hides after ~30s idle) for external agents driving over MCP
|
|
146
|
+
(stdio or HTTP). **Double-click it to stop**: runs the `clawdcursor stop`
|
|
147
|
+
flow (abort in-flight task → graceful shutdown). The human at the machine
|
|
148
|
+
always knows, and always has a kill switch. Windows today (macOS/Linux
|
|
149
|
+
adapters welcome — the controller is platform-neutral); disable with
|
|
150
|
+
`--no-banner` or `CLAWD_NO_BANNER=1`.
|
|
151
|
+
|
|
152
|
+
- Unmatched HTTP routes now return a JSON 404 with the endpoint list instead of
|
|
153
|
+
Express's default HTML error page.
|
|
154
|
+
|
|
155
|
+
## [1.5.1] - 2026-06-12 — bulletproofing patch (live-session bugs)
|
|
156
|
+
|
|
157
|
+
Every fix in this patch came from a real failure observed while agents drove
|
|
158
|
+
real UIs — found in live runs, fixed at the root, regression-tested.
|
|
159
|
+
|
|
160
|
+
### Fixed — safety
|
|
161
|
+
|
|
162
|
+
- **Coordinate clicks can no longer silently land on the wrong window.** When
|
|
163
|
+
Windows' foreground-lock defeats the pre-click activation (or the click
|
|
164
|
+
point is over a different window), `click`/`smart_click` now return a loud
|
|
165
|
+
**"⚠ FOCUS NOT CONFIRMED — DO NOT type next"** warning with the window that
|
|
166
|
+
was actually promoted, instead of a hollow success. The trigger was a real
|
|
167
|
+
keystroke leak: an OTP typed after a missed click went into a background
|
|
168
|
+
chat window.
|
|
169
|
+
|
|
170
|
+
### Fixed — `task` delegation no longer times out MCP clients
|
|
171
|
+
|
|
172
|
+
- `task` / `delegate_to_agent` used to await the **whole** autonomous loop, so
|
|
173
|
+
any task longer than the client's per-call timeout (~60s) "timed out" while
|
|
174
|
+
the work finished invisibly. Now it waits up to `timeout` seconds (default
|
|
175
|
+
45, clamped 1–50): finished → result as before; still running → a
|
|
176
|
+
`{status:"running"}` receipt with live progress while the loop continues.
|
|
177
|
+
Re-calling with the **same** task text re-attaches (never restarts); the
|
|
178
|
+
compact `task` tool gains `{action:"status"}` / `{action:"abort"}`.
|
|
179
|
+
A client-side timeout is **not** a task failure.
|
|
180
|
+
|
|
181
|
+
### Fixed — perception honesty
|
|
182
|
+
|
|
183
|
+
- Window/element guards (`expect:{window:...}`) now normalize invisible
|
|
184
|
+
Unicode — Edge's title contains a no-break space in "Microsoft Edge" that
|
|
185
|
+
made correct guards fail.
|
|
186
|
+
- The a11y → CDP DOM fallback verifies the connected page actually corresponds
|
|
187
|
+
to the **focused** window before answering; it no longer reports another
|
|
188
|
+
browser's buttons as if they were on the focused page.
|
|
189
|
+
|
|
190
|
+
### Fixed — ergonomics
|
|
191
|
+
|
|
192
|
+
- The agent's dedicated browser launches maximized (fresh profiles used to
|
|
193
|
+
open as a tiny window).
|
|
194
|
+
- `consent` / README / website / `doctor --help` now state the two-path
|
|
195
|
+
onboarding truth: MCP setup is `consent` + (macOS) `grant` — `doctor` is
|
|
196
|
+
only for the autonomous `agent` mode. macOS: Accessibility is required;
|
|
197
|
+
Screen Recording is optional (vision fallback only).
|
|
198
|
+
|
|
199
|
+
## [1.5.0] - 2026-06-11 — UI State Compiler + reactive verification
|
|
200
|
+
|
|
201
|
+
The headline of this release is a new perception substrate and a verification
|
|
202
|
+
discipline that together let a cheap text model drive the desktop reliably,
|
|
203
|
+
without reaching for screenshots. No tools were renamed — existing editor
|
|
204
|
+
permission allowlists keep working; v1.5.0 only **adds** capability.
|
|
205
|
+
|
|
206
|
+
### Added — the el_NN UI State Compiler
|
|
207
|
+
|
|
208
|
+
- **`compile_ui`** fuses the accessibility tree and OCR into ONE confidence-scored,
|
|
209
|
+
source-attributed UI map: every element gets a stable `el_NN` id, a role, a
|
|
210
|
+
name, coordinates, and capability flags. Act on an element symbolically via
|
|
211
|
+
`{element_id, snapshot_id}` — near-free in tokens, DPI-proof, and it survives
|
|
212
|
+
layout shifts.
|
|
213
|
+
- **Semantic finders** `find_action_button(intent)` / `find_input_field(purpose)`
|
|
214
|
+
locate a target by meaning (synonyms + geometric label association) and return
|
|
215
|
+
the `el_NN` to act on, escalating to OCR only when the a11y tree is sparse.
|
|
216
|
+
- These are reachable from BOTH the granular surface and the compact
|
|
217
|
+
`accessibility` compound (`action: "compile_ui" | "find_button" | "find_field"`).
|
|
218
|
+
|
|
219
|
+
### Added — reactive step discipline (Layer C)
|
|
220
|
+
|
|
221
|
+
- Consequential actions (`invoke_element`, `set_field_value`, `type`, `key`,
|
|
222
|
+
`click`, `drag`, …) take an optional **`expect`** array of assertions. After the
|
|
223
|
+
action, clawdcursor verifies the stated outcome — polling for a short settle
|
|
224
|
+
window so asynchronous UIs (chip resolution, lazy title updates) aren't falsely
|
|
225
|
+
failed — and reports a **DEVIATION** when the UI didn't obey, instead of
|
|
226
|
+
reporting a hollow success. The agent adapts rather than building on a false
|
|
227
|
+
assumption.
|
|
228
|
+
- A new `move` (hover) action and a stepped `drag` `path` (curve tracing) round
|
|
229
|
+
out the canvas/gesture surface.
|
|
230
|
+
|
|
231
|
+
### Fixed — agent-loop reliability (internal audit)
|
|
232
|
+
|
|
233
|
+
- The post-action UI map is no longer invalidated the instant it's advertised —
|
|
234
|
+
`el_NN` refs offered for the next turn now actually resolve.
|
|
235
|
+
- Ref freshness no longer races the LLM round-trip (TTL widened; event-driven
|
|
236
|
+
invalidation + the window guard are the real staleness signals).
|
|
237
|
+
- `batch` steps now get the FULL single-call pipeline: label resolution for the
|
|
238
|
+
safety gate, active-app refresh between steps, outcome-gated map invalidation,
|
|
239
|
+
and per-step `expect` verification.
|
|
240
|
+
- Coordinate-space default follows context (image-space only while a screenshot
|
|
241
|
+
is actually in context; it no longer latches on for the rest of a run).
|
|
242
|
+
- Every screen-derived tool output (a11y, OCR, page DOM, clipboard) is wrapped in
|
|
243
|
+
`<untrusted-screen-content>` delimiters — prompt-injection defense now covers
|
|
244
|
+
every perception path, not just two.
|
|
245
|
+
|
|
246
|
+
### Fixed — external-agent (MCP) surface
|
|
247
|
+
|
|
248
|
+
- The `el_NN` substrate is now reachable over stdio MCP (a session UIMap holder
|
|
249
|
+
is constructed for the editor-hosted server, not only the daemon).
|
|
250
|
+
- The safety gate resolves `el_NN` refs to their element label over MCP too, so
|
|
251
|
+
destructive-label gating (Send/Delete/Pay) fires the same as in-loop; a
|
|
252
|
+
caller-supplied `expect` is honored on the MCP route.
|
|
253
|
+
- `cdp_connect` / `browser_connect` now disclose when they **attached to your
|
|
254
|
+
existing browser session** vs launched a dedicated agent-owned instance.
|
|
255
|
+
- `get_value` reads the editor's text via TextPattern (Windows) / non-empty
|
|
256
|
+
AXValue (macOS) when ValuePattern is empty — fixes false "value is blank"
|
|
257
|
+
reads on Win11 Notepad and the duplicate-write retries they caused.
|
|
258
|
+
- `read_clipboard` output is untrusted-wrapped; `close_window` warns it discards
|
|
259
|
+
all tabs/documents; dead `system` compound actions removed; `shortcuts_list`
|
|
260
|
+
drops platform-empty keys and de-duplicates.
|
|
261
|
+
|
|
262
|
+
### Changed — security & browser ownership (post-RC hardening, same release)
|
|
263
|
+
|
|
264
|
+
- **Loopback-only bind is now enforced.** The daemon refuses to start when
|
|
265
|
+
`server.host` is a non-loopback address unless launched with
|
|
266
|
+
`--allow-remote` (which prints a loud warning). If you deliberately bind to
|
|
267
|
+
`0.0.0.0`/a LAN IP, add the flag; otherwise set the host back to `127.0.0.1`.
|
|
268
|
+
- **The agent's dedicated browser moved to its own CDP port** (`9333`, env
|
|
269
|
+
`CLAWD_AGENT_CDP_PORT`); port `9223` is now reserved for browsers *you* put
|
|
270
|
+
on the wire (`relaunch_with_cdp`, your own debug flags). Ownership is encoded
|
|
271
|
+
in the port, the dedicated instance's window is labeled
|
|
272
|
+
*"ClawdCursor — agent browser"*, and in attached mode navigation mechanically
|
|
273
|
+
opens the agent's **own tab** — your tabs are never navigated away.
|
|
274
|
+
- `mouse_triple_click` follows up with select-all when it lands in an edit
|
|
275
|
+
field, so typing after it replaces pre-filled text (Save As dialogs).
|
|
276
|
+
- Dependencies: commander 15, zod 4 (the MCP SDK peer-supports both), tsx
|
|
277
|
+
4.22.4.
|
|
278
|
+
- CI: coverage ratchet thresholds + a production-path perf tripwire join the
|
|
279
|
+
existing npm-audit gate; the MCP SDK boundary is now explicitly typed.
|
|
280
|
+
|
|
281
|
+
### Fixed — macOS parity (cross-platform audit)
|
|
282
|
+
|
|
283
|
+
- **el_NN now works on macOS.** The role map was Windows-UIA-only, so macOS AX
|
|
284
|
+
text fields and links resolved to "unknown" and the find/fill/link-click path
|
|
285
|
+
was effectively dead — added the AX role synonyms.
|
|
286
|
+
- **macOS password fields are redacted.** Secureness lives in the AX *subrole*
|
|
287
|
+
(`AXSecureTextField`); the helper now reads it and withholds the value, so a
|
|
288
|
+
secret never reaches the prompt or the fingerprint.
|
|
289
|
+
- The no-coordinate `scroll` center is computed in the driver's coordinate space
|
|
290
|
+
(logical points on Retina) instead of mislanding 2× off.
|
|
291
|
+
- macOS UI-tree traversal deepened to match Windows (depth 8), so `compile_ui`
|
|
292
|
+
sees real apps instead of a near-empty tree.
|
|
293
|
+
- README corrected: `clawdcursor grant` approves permissions; it does not build
|
|
294
|
+
the native helper.
|
|
295
|
+
|
|
296
|
+
## [1.0.4] - 2026-06-07 — fix Windows minimize/resize (#153)
|
|
297
|
+
|
|
298
|
+
- **`window minimize` (and `window resize`) silently did nothing on Windows.**
|
|
299
|
+
Root cause: the PowerShell those commands run is built as a single concatenated
|
|
300
|
+
line and executed via `powershell.exe -Command <string>`, but it opened the
|
|
301
|
+
`Add-Type -MemberDefinition` block with a **here-string** (`@"…"@`). A here-string
|
|
302
|
+
header must be the last token on its line — on a single line PowerShell raises
|
|
303
|
+
*"No characters are allowed after a here-string header before the end of the line"*
|
|
304
|
+
and the **entire script fails to parse**, so the call produced no output and
|
|
305
|
+
returned `false`. Reported for UWP apps (Calculator/Settings) but it affected
|
|
306
|
+
every window. Switched to a single-quoted `-MemberDefinition '…'` (C# double-quotes
|
|
307
|
+
are literal inside it). Fixed in `setWindowState` (minimize/maximize/restore/close)
|
|
308
|
+
and `setWindowBounds` (resize); a static guard test prevents the here-string from
|
|
309
|
+
returning.
|
|
310
|
+
- Minimize now also drives the transition through the UIA `WindowPattern`
|
|
311
|
+
(`SetWindowVisualState`) with a title-first window lookup, the supported
|
|
312
|
+
cross-process path for UWP / ApplicationFrameHost-hosted windows whose Win32
|
|
313
|
+
`ShowWindow(SW_MINIMIZE)` no-ops; falls back to `ShowWindowAsync` for plain Win32.
|
|
314
|
+
Verified live on Calculator: minimize / restore / maximize / restore all succeed.
|
|
315
|
+
|
|
316
|
+
## [1.0.3] - 2026-06-07 — fix macOS install/update loop (#155)
|
|
317
|
+
|
|
318
|
+
- **macOS updates were blocked after the first install.** `native/build.sh` writes
|
|
319
|
+
the helper into the git tree (`native/ClawdCursor.app/`, `native/.build/`), but
|
|
320
|
+
those weren't gitignored — so `install.sh`'s clean-tree guard saw a "dirty" tree
|
|
321
|
+
and refused every subsequent update. Now gitignored, and the generated
|
|
322
|
+
`native/ClawdCursor.app/Contents/Info.plist` (which made git descend into the
|
|
323
|
+
`.app` and surface the untracked binaries) is untracked — `build.sh` regenerates
|
|
324
|
+
it. The `.app` is built on-device and was never in the npm package.
|
|
325
|
+
- `clawdcursor uninstall` now also removes the native build artifacts.
|
|
326
|
+
|
|
327
|
+
## [1.0.2] - 2026-06-07 — resilient uninstall
|
|
328
|
+
|
|
329
|
+
- **`clawdcursor uninstall` no longer crashes on Windows when a file is locked.**
|
|
330
|
+
A still-held handle on `~/.clawdcursor` (a running daemon, or the process's own
|
|
331
|
+
log file) raised `EPERM`, which escaped as an `unhandledRejection` and aborted
|
|
332
|
+
the uninstall half-done (config removed, global link + data dir left behind).
|
|
333
|
+
Each removal step now retries transient locks (`rmSync` maxRetries) and, on a
|
|
334
|
+
hard failure, warns + continues + lists the leftovers to delete manually —
|
|
335
|
+
instead of crashing the whole command.
|
|
336
|
+
|
|
337
|
+
## [1.0.1] - 2026-06-06 — first npm publish + code-scanning cleanup
|
|
338
|
+
|
|
339
|
+
- First v1.x release published to the npm registry (`npm i -g clawdcursor`).
|
|
340
|
+
- Cleaned 4 CodeQL `js/unused-local-variable` notes (dead `shotToBlock` helper in
|
|
341
|
+
agent.ts, unused `beforeEach`/`invokeTool` in the characterization test, unused
|
|
342
|
+
`STEPS` const in scripts/measure-batch-tokens.ts). No behavior change.
|
|
343
|
+
|
|
344
|
+
## [1.0.0] - 2026-06-06 — toolbox-first: pipeline removed, tools unified, thin agent loop
|
|
345
|
+
|
|
346
|
+
> **Breaking (major).** clawdcursor is now a desktop MCP **toolbox** for any agent, plus a thin *optional* autonomous loop. The autonomous morph pipeline (router → blind/hybrid/vision, decompose, verify, reflector) is gone — a capable model is its own pipeline. The `task` tool still hands a whole task to a cheaper configured model that "takes the wheel"; 4 pipeline-introspection tools were removed (catalog 98 → 94).
|
|
347
|
+
|
|
348
|
+
### macOS
|
|
349
|
+
|
|
350
|
+
- **#154 (HiDPI/Retina mouse):** clicks/drags/moves no longer land ~2× off-target — mouse coords now map image-space → **logical** points on macOS (nut-js drives in logical points), physical on Windows/Linux. *(Correct by construction; needs real-Mac verification.)*
|
|
351
|
+
- **#150 / #151:** native helper bundle is signable (Info.plist generated, comment-free entitlements) and the mac/linux runtime scripts ship in the package. *(Confirmed on a real Mac, macOS 26.)*
|
|
352
|
+
- **#149:** screenshot helper inherits the daemon's Screen-Recording grant — ad-hoc signing no longer uses hardened runtime. *(Pending real-Mac re-verification.)*
|
|
353
|
+
- `window focus` by `processId` / `processName` now works on macOS (the JXA flag names were wrong).
|
|
354
|
+
|
|
355
|
+
### Perception — cheap-first guidance made explicit
|
|
356
|
+
|
|
357
|
+
The MCP connect-time instructions and tool descriptions now spell out the escalation: read the accessibility tree first → OCR when the tree is empty/sparse → screenshot only as a last resort; prefer named-target actions over pixel coordinates. Every tool also carries a `[act] < [inspect] < [perceive-text] < [perceive-image]` cost-class prefix.
|
|
358
|
+
|
|
359
|
+
### Removed — autonomous pipeline cluster (~13,000 LOC)
|
|
360
|
+
|
|
361
|
+
The router → blind/hybrid/vision morph ladder, preprocessor, decomposer, classifier,
|
|
362
|
+
verifier (ground-truth signals), Reflector, and knowledge/guide loader have all been
|
|
363
|
+
deleted. The file surface removed:
|
|
364
|
+
|
|
365
|
+
- `src/core/pipeline.ts`, `src/core/verifier.ts`, `src/core/compound.ts`,
|
|
366
|
+
`src/core/palettes.ts`, `src/core/handoff.ts`, `src/core/desktop-survey.ts`
|
|
367
|
+
- `src/core/classify/` (full directory)
|
|
368
|
+
- `src/core/decompose/` (full directory)
|
|
369
|
+
- `src/core/skills/` (full directory)
|
|
370
|
+
- `src/core/router/` (full directory)
|
|
371
|
+
- `src/core/knowledge/` (full directory)
|
|
372
|
+
|
|
373
|
+
Four granular tools removed alongside the pipeline:
|
|
374
|
+
`classify_task`, `detect_app`, `get_app_guide`, `learn_app`.
|
|
375
|
+
|
|
376
|
+
The `clawdcursor guides` CLI command is removed.
|
|
377
|
+
|
|
378
|
+
### Changed — thin agent loop replaces the morph ladder
|
|
379
|
+
|
|
380
|
+
`agent.ts` is rewired to a single `runAgent` loop: the configured model perceives the
|
|
381
|
+
desktop (a11y → OCR → screenshot as needed), selects tools, and iterates until done or
|
|
382
|
+
the turn budget is exhausted. No rung selection, no mode flags, no rung escalation.
|
|
383
|
+
`AgentInput` is simplified: `task / maxTurns / isAborted / targetWindow` only.
|
|
384
|
+
|
|
385
|
+
`buildUnifiedTools()` and `buildSystemPrompt()` no longer accept a mode or capability
|
|
386
|
+
argument — they return the full unified toolbox.
|
|
387
|
+
|
|
388
|
+
### Changed — MCP tool count
|
|
389
|
+
|
|
390
|
+
Granular catalog drops from 98 to **94 tools** (the four pipeline-only tools removed).
|
|
391
|
+
Compact surface: `computer` · `accessibility` · `window` · `system` · `browser` · `task` · `batch` = **7 entries**.
|
|
392
|
+
|
|
393
|
+
### Changed — `task` delegation
|
|
394
|
+
|
|
395
|
+
`submit_task` → `agent.executeTask` → `_executeTask` → `runAgent`. The thin loop is the
|
|
396
|
+
configured model self-driving the toolbox. Framing: an expensive external agent can
|
|
397
|
+
delegate grunt work to clawdcursor's cheaper configured model, which takes the wheel.
|
|
398
|
+
|
|
399
|
+
### Added — `batch` tool
|
|
400
|
+
|
|
401
|
+
New `batch` tool collapses N tool calls into one round-trip (declarative, guarded,
|
|
402
|
+
safety-gated per step). Each step is `{ name, arguments, expect? }`; optional `expect`
|
|
403
|
+
re-perceives before the step and halts on mismatch. On any guard miss, safety stop, or
|
|
404
|
+
error the batch halts and returns a per-step trace. `dryRun:true` pre-scans safety tiers
|
|
405
|
+
without executing. The efficiency lever for a driving agent: N calls → 1.
|
|
406
|
+
|
|
407
|
+
---
|
|
408
|
+
|
|
409
|
+
### Tool-unification migration (also part of 1.0.0)
|
|
410
|
+
|
|
411
|
+
### Changed — one tool implementation, used everywhere
|
|
412
|
+
|
|
413
|
+
The MCP tool surface and the internal autonomous agent-loop used to carry **two
|
|
414
|
+
parallel implementations** of ~35 of the same tools (~2,100 LOC of duplication).
|
|
415
|
+
The MCP surface now **projects from the agent-loop (System B) implementations** via
|
|
416
|
+
`projectToToolDefinition`, so external agents inherit the reliability tweaks that
|
|
417
|
+
were previously internal-only: smushed-coordinate coercion, focus-theft
|
|
418
|
+
detection/reporting, automatic pid-scoping for a11y searches, the clipboard
|
|
419
|
+
paste fast-path, and conditional coordinate scaling.
|
|
420
|
+
|
|
421
|
+
- ~34 tools migrated (window, keyboard, mouse, a11y/perception, CDP). **Tool names
|
|
422
|
+
are unchanged — no renames** (the MCP catalog stays at 98 tools), so existing
|
|
423
|
+
editor/agent permission allowlists keep working. Parameters are backward-compatible
|
|
424
|
+
with one exception: `mouse_drag` drops the `x1/y1/x2/y2` convenience aliases (use the
|
|
425
|
+
canonical `startX/startY/endX/endY`, which are unchanged).
|
|
426
|
+
- Tools where System A is richer or unique are **kept on System A**: `ocr_read_screen`
|
|
427
|
+
(structured `elements[]`+bounds output), `smart_*`, `find_element`,
|
|
428
|
+
`navigate_browser` (the browser *launcher*), `cdp_evaluate/select/wait/tabs/scroll`,
|
|
429
|
+
and the extra mouse variants.
|
|
430
|
+
- A shared characterization test-suite pins the System B behaviors so the projection
|
|
431
|
+
can't silently regress them.
|
|
432
|
+
- (Pending) deletion of the now-dead System A handler bodies — the LOC drop lands
|
|
433
|
+
in a follow-up; this release makes System B the single source of truth.
|
|
434
|
+
|
|
435
|
+
### Fixed
|
|
436
|
+
|
|
437
|
+
- **Packaging (#151):** the published package now ships the macOS (`scripts/mac/`)
|
|
438
|
+
and Linux (`scripts/linux/`) runtime scripts. Previously only Windows `.ps1` files
|
|
439
|
+
were whitelisted, so accessibility/window/OCR tools were dead on mac/Linux installs
|
|
440
|
+
— the same class of bug as the earlier Windows-bridge omission.
|
|
441
|
+
- **macOS native helper (#150):** `native/build.sh` now generates `Contents/Info.plist`
|
|
442
|
+
(without it the `.app` is an invalid, unsignable bundle) and `entitlements.plist` no
|
|
443
|
+
longer contains XML comments that `codesign`'s AMFI parser rejects. Unblocks the
|
|
444
|
+
signed-bundle path that TCC (Accessibility / Screen Recording) and #149 depend on.
|
|
445
|
+
(Final macOS sign/run verification is tracked in #150 / #149.)
|
|
446
|
+
- **Compact-surface friction:** native-name aliases stop the MCP validator from
|
|
447
|
+
silently dropping a correctly-intended arg; a central required-arg guard converts the
|
|
448
|
+
crash-on-undefined class into actionable errors; `open_app`/`open_file`/`open_url` are
|
|
449
|
+
reachable from the `system` compound (not just `window`); an unknown action now names
|
|
450
|
+
the compound that owns it; `key_press` accepts space-separated key sequences.
|
|
451
|
+
- **a11y consistency:** `smart_click` / `smart_type` / `smart_read` accept `name` as an
|
|
452
|
+
alias for `target` (the rest of the accessibility surface uses `name`).
|
|
453
|
+
- Confirm-tier safety and `task`-unavailable error messages are now actionable.
|
|
454
|
+
|
|
455
|
+
### Behavior changes (v2)
|
|
456
|
+
|
|
457
|
+
### Migration notes (v2 behavior change)
|
|
458
|
+
|
|
459
|
+
**`mouse_click` / `mouse_drag` / `mouse_scroll` — `space:'screen'` no longer double-scales**
|
|
460
|
+
|
|
461
|
+
External MCP callers that omit the `space` parameter are **unaffected** — omitting `space` continues to default to `'image'`, which applies the same image→physical scaling that all previous releases applied.
|
|
462
|
+
|
|
463
|
+
The one behavior change is for callers that explicitly pass `space:'screen'`:
|
|
464
|
+
|
|
465
|
+
| Caller behavior | v1.x result | v2 result |
|
|
466
|
+
|---|---|---|
|
|
467
|
+
| `{x, y}` (no `space`) | scaled (image→physical) | scaled (image→physical) — **unchanged** |
|
|
468
|
+
| `{x, y, space:'image'}` | scaled (image→physical) | scaled (image→physical) — **unchanged** |
|
|
469
|
+
| `{x, y, space:'screen'}` | **double-scaled** (bug) | pass-through — **fixed** |
|
|
470
|
+
|
|
471
|
+
If your agent passes a11y-snapshot coordinates via `mouse_click` / `mouse_drag` / `mouse_scroll` and previously compensated by dividing by the DPI ratio before sending, remove that compensation after upgrading.
|
|
472
|
+
|
|
473
|
+
### Implementation notes
|
|
474
|
+
|
|
475
|
+
- `mouse_click`, `mouse_drag`, `mouse_scroll`, `mouse_move_relative`, `mouse_down`, `mouse_up` are now projected from System B (`buildUnifiedTools`) via `projectToToolDefinition` (the same uniform path used by the window and keyboard groups in Steps 3–4).
|
|
476
|
+
- The projected coord-sensitive tools (`click`, `drag`, `scroll`) inject `space:'image'` as the default when the caller omits it, preserving the legacy scaling contract.
|
|
477
|
+
- System A handlers for these six tools are intentionally kept (Step 8 handles removal).
|
|
478
|
+
- Tools left on System A (no System B granular equivalent): `mouse_hover`, `mouse_double_click`, `mouse_right_click`, `mouse_middle_click`, `mouse_triple_click`, `mouse_scroll_horizontal`, `mouse_drag_stepped`.
|
|
479
|
+
- **`mouse_drag`**: the `x1/y1/x2/y2` convenience aliases are removed; use the canonical `startX/startY/endX/endY` (unchanged, still required). Callers already using the canonical names are unaffected.
|
|
480
|
+
|
|
481
|
+
**`mouse_scroll` — `x` and `y` are no longer required**
|
|
482
|
+
|
|
483
|
+
System A required `x`, `y`, and `direction`. In v2 only `direction` is required; omitting `x`/`y` scrolls at the screen center (safe default). Callers that always supply `x`/`y` are unaffected.
|
|
484
|
+
|
|
485
|
+
| Caller behavior | v1.x result | v2 result |
|
|
486
|
+
|---|---|---|
|
|
487
|
+
| `{x, y, direction}` | scrolls at (x,y) | scrolls at (x,y) — **unchanged** |
|
|
488
|
+
| `{direction}` (no x/y) | schema validation error (x/y required) | scrolls at screen center |
|
|
489
|
+
|
|
490
|
+
**`key_press` — `key` param removed from JSON-Schema `required` array**
|
|
491
|
+
|
|
492
|
+
System A's JSON schema listed `key` as required. In v2 the schema lists neither `combo` nor `key` as required (the execute body still guards the total absence and returns an actionable error). Callers supplying the `key` param are fully unaffected; the only change is that MCP-level schema validation no longer rejects a missing-key call before it reaches the handler.
|
|
493
|
+
|
|
494
|
+
| Caller behavior | v1.x result | v2 result |
|
|
495
|
+
|---|---|---|
|
|
496
|
+
| `{key: "Return"}` | runs normally | runs normally — **unchanged** |
|
|
497
|
+
| `{}` (no key) | schema validation error | handler-level error (actionable message) |
|
|
498
|
+
|
|
499
|
+
**`set_field_value` — category corrected from `'window'` to `'perception'`**
|
|
500
|
+
|
|
501
|
+
TOOL_META had `set_field_value` category as `'window'`; System A's `a11y_depth.ts` definition uses `'perception'`. The mismatch is corrected: the projected tool now reports `category: 'perception'`, matching the System A original. This is a routing/metadata fix with no behavioral change.
|
|
502
|
+
|
|
503
|
+
**`invoke_element` — `automationId` matching now falls back to name-based search**
|
|
504
|
+
|
|
505
|
+
The `automationId` parameter is accepted for backward-compat but the `PlatformAdapter.invokeElement` interface does not expose automationId filtering. When a caller passes only `automationId` (no `name`), the value is used as the `name` search string, which is a best-effort fallback.
|
|
506
|
+
|
|
507
|
+
| Caller behavior | v1.x result | v2 result |
|
|
508
|
+
|---|---|---|
|
|
509
|
+
| `{name: "OK"}` | name-based a11y match | same — **unchanged** |
|
|
510
|
+
| `{automationId: "btn_ok"}` | exact automationId match | uses `automationId` as name string (best-effort) |
|
|
511
|
+
| `{name: "OK", automationId: "btn_ok"}` | name + automationId match | name is used; automationId is accepted but not narrowing |
|
|
512
|
+
|
|
513
|
+
For precise automationId targeting, prefer `find_element` (which filters by automationId) followed by `invoke_element` with the found element's `name`.
|
|
514
|
+
|
|
515
|
+
**`cdp_connect` — now auto-launches a browser when none is running**
|
|
516
|
+
|
|
517
|
+
Previously `cdp_connect` only attached to an already-running Chrome/Edge process.
|
|
518
|
+
In v2 it auto-launches Edge/Chrome with the CDP debug port if no browser is connected.
|
|
519
|
+
|
|
520
|
+
| Caller behavior | v1.x result | v2 result |
|
|
521
|
+
|---|---|---|
|
|
522
|
+
| No browser running | error "Failed to connect…" | launches Edge/Chrome, then connects |
|
|
523
|
+
| Browser already running | attaches | attaches — **unchanged** |
|
|
524
|
+
|
|
525
|
+
If you previously launched the browser manually (via `navigate_browser`) before calling `cdp_connect`, that workflow continues to work. The new behavior is additive.
|
|
526
|
+
|
|
527
|
+
**`cdp_page_context` — gains an optional `selector` param**
|
|
528
|
+
|
|
529
|
+
Previously `cdp_page_context` took no parameters and always returned the full structured
|
|
530
|
+
interactive-element list for the page.
|
|
531
|
+
In v2 callers may pass an optional CSS `selector`; when present, the tool returns the
|
|
532
|
+
plain-text content of the matching element instead of the full element list.
|
|
533
|
+
|
|
534
|
+
| Caller behavior | v1.x result | v2 result |
|
|
535
|
+
|---|---|---|
|
|
536
|
+
| No params | structured interactive-element list | same — **unchanged** |
|
|
537
|
+
| `{selector: "main"}` | invalid param (ignored or error) | text content of `main` element |
|
|
538
|
+
|
|
539
|
+
Callers that pass no params are fully unaffected. The no-param path returns the same
|
|
540
|
+
`getPageContext()` result as before.
|
|
541
|
+
|
|
542
|
+
### Implementation notes (Step 7 — CDP / browser group)
|
|
543
|
+
|
|
544
|
+
- `cdp_connect`, `cdp_page_context`, `cdp_click`, `cdp_type` are now projected from System B
|
|
545
|
+
(`buildUnifiedTools`) via `projectToToolDefinition` (the same uniform path used by Steps 3–6).
|
|
546
|
+
- System A handlers for these four tools are intentionally kept (Step 8 handles removal).
|
|
547
|
+
- **`navigate_browser` is NOT migrated.** System A's `navigate_browser` is a browser-launcher
|
|
548
|
+
tool (`safetyTier 2`, `category: 'orchestration'`) that spawns Edge/Chrome with
|
|
549
|
+
`--remote-debugging-port`. System B's `browser_navigate` is a within-session navigation call
|
|
550
|
+
that requires a prior `browser_connect`. Projecting `browser_navigate` as `navigate_browser`
|
|
551
|
+
would silently strip the launch capability and break external callers.
|
|
552
|
+
- Tools left on System A (no System B equivalent in `buildUnifiedTools()`):
|
|
553
|
+
`navigate_browser`, `cdp_read_text`, `cdp_select_option`, `cdp_evaluate`,
|
|
554
|
+
`cdp_wait_for_selector`, `cdp_list_tabs`, `cdp_switch_tab`, `cdp_scroll`.
|
|
555
|
+
|
|
556
|
+
---
|
|
557
|
+
|
|
558
|
+
## [1.0.0-autonomous] - 2026-06-03 — adaptive pipeline variant (superseded by the toolbox 1.0.0; preserved on branch `v1.0.0-autonomous`)
|
|
559
|
+
|
|
560
|
+
### Upgrading from 0.9.x
|
|
561
|
+
|
|
562
|
+
**MCP server id.** The server id has been `clawdcursor` since v0.9.0 (it
|
|
563
|
+
was `clawd-cursor` before that). If your editor re-prompts for every tool
|
|
564
|
+
call after upgrading, your allowlist entries are keyed to the old id or to
|
|
565
|
+
individual tool names. Switch to the **server-level wildcard**:
|
|
566
|
+
|
|
567
|
+
```
|
|
568
|
+
mcp__clawdcursor
|
|
569
|
+
```
|
|
570
|
+
|
|
571
|
+
A single wildcard entry covers all current and future tools and survives
|
|
572
|
+
tool renames across versions — per-tool entries like
|
|
573
|
+
`mcp__clawdcursor__window` silently break whenever a tool is added,
|
|
574
|
+
removed, or renamed.
|
|
575
|
+
|
|
576
|
+
### Added — text ↔ vision handoff in the adaptive pipeline
|
|
577
|
+
|
|
578
|
+
The pipeline now switches between text-only and vision rungs mid-task
|
|
579
|
+
when the verifier signals a mismatch, rather than restarting. Spatial
|
|
580
|
+
gestures (drag into / onto) correctly morph to the vision rung instead of
|
|
581
|
+
staying blind.
|
|
582
|
+
|
|
583
|
+
### Added — cost-class metadata on all 97 granular tools
|
|
584
|
+
|
|
585
|
+
Every granular tool is stamped with a `costClass` (`act` / `inspect` /
|
|
586
|
+
`perceive-text` / `perceive-image`). The class is exposed in the MCP
|
|
587
|
+
`tools/list` description prefix so external agents can select the
|
|
588
|
+
cheapest viable tool without reading the full schema.
|
|
589
|
+
|
|
590
|
+
### Added — desktop-survey grounding for the preprocessor
|
|
591
|
+
|
|
592
|
+
The preprocessor and decomposer now plan from live desktop perception
|
|
593
|
+
(open windows + OS-default handlers) instead of static app guesses.
|
|
594
|
+
The stay-in-target-window guardrail refuses actions against windows that
|
|
595
|
+
were not open when the task started.
|
|
596
|
+
|
|
597
|
+
### Added — intent-driven email compose-send
|
|
598
|
+
|
|
599
|
+
`compose-send` only auto-fires the Send action when the task description
|
|
600
|
+
explicitly requests sending. Tasks that ask to draft or compose leave a
|
|
601
|
+
pre-filled draft open instead of dispatching immediately.
|
|
602
|
+
|
|
603
|
+
### Added — CDP/DOM browser rung for the autonomous agent
|
|
604
|
+
|
|
605
|
+
For web tasks the autonomous agent can drive a dedicated, agent-owned
|
|
606
|
+
browser through the DOM (CSS selectors / visible text, no pixels) instead
|
|
607
|
+
of OCR-on-the-desktop plus coordinate clicks. The instance is launched with
|
|
608
|
+
its own profile so it never closes, reuses, or steals focus from the user's
|
|
609
|
+
own browser windows. Degrades gracefully to OCR (`read_text` / `smart_click`)
|
|
610
|
+
when CDP isn't available.
|
|
611
|
+
|
|
612
|
+
### Added — OCR perception on the cheap text rung
|
|
613
|
+
|
|
614
|
+
`read_text` and `smart_click` let the text model read and click webview /
|
|
615
|
+
canvas content via OCR — no escalation to the vision model.
|
|
616
|
+
|
|
617
|
+
### Fixed — npm package shipped without the Windows bridge + OCR scripts (critical)
|
|
618
|
+
|
|
619
|
+
`scripts/ps-bridge.ps1` (the persistent UIA bridge) and `scripts/ocr-recognize.ps1`
|
|
620
|
+
were never in the package.json `files` whitelist, so a real `npm install` shipped
|
|
621
|
+
without them. On Windows the bridge crashed on every spawn in an infinite restart
|
|
622
|
+
loop, leaving the whole desktop-perception layer dead — `list_windows` returned 0,
|
|
623
|
+
the accessibility tree was empty, OCR failed — so the agent could launch apps but
|
|
624
|
+
was blind. This affected every published install (0.9.7–0.9.9); it was masked in
|
|
625
|
+
development by `npm link`. Now `scripts/*.ps1` ships in the package.
|
|
626
|
+
|
|
627
|
+
### Added — Windows panic-stop hotkey
|
|
628
|
+
|
|
629
|
+
`scripts/install-panic-hotkey.ps1` installs a global keyboard shortcut
|
|
630
|
+
(default Ctrl+Alt+K) that force-kills every clawdcursor process — the daemon and
|
|
631
|
+
its PowerShell UIA/OCR children — instantly, for when an autonomous run misbehaves.
|
|
632
|
+
|
|
633
|
+
### Fixed — Save As filename field on Windows
|
|
634
|
+
|
|
635
|
+
The granular `set_field_value` → `invoke-element set-value` path in
|
|
636
|
+
`ps-bridge.ps1` lacked the composite handling added to the compound
|
|
637
|
+
`set_value` path in v0.9.7. The "File name:" label is a read-only Text
|
|
638
|
+
element; the fix resolves the writable sibling Edit control via
|
|
639
|
+
`LabeledBy` before writing, with a keyboard-sequence fallback.
|
|
640
|
+
|
|
641
|
+
### Fixed — CLI flags honoured in non-interactive mode
|
|
642
|
+
|
|
643
|
+
`--provider` and `--model` flags passed to `clawdcursor agent` were
|
|
644
|
+
silently ignored when no TTY was attached. The config-reading path now
|
|
645
|
+
applies CLI flags before falling back to the config file on all entry
|
|
646
|
+
points.
|
|
647
|
+
|
|
648
|
+
### Fixed — keyboard / typing / open_app could hang over MCP (tools-only)
|
|
649
|
+
|
|
650
|
+
Over `clawdcursor mcp` (stdio) and `agent --no-llm` (HTTP), `key_press`,
|
|
651
|
+
`type_text`, and `open_app` could hang indefinitely. Root cause: a latent
|
|
652
|
+
zombie-promise in the persistent PowerShell/UIA bridge runner — when the
|
|
653
|
+
bridge exited before signalling ready, the startup promise was never
|
|
654
|
+
settled, so any awaiter hung forever. The bridge now rejects and recovers,
|
|
655
|
+
and the cosmetic active-window lookup in `key_press`/`type_text` is
|
|
656
|
+
time-boxed so a slow or recovering bridge can never block a keystroke. The
|
|
657
|
+
full LLM agent path was unaffected.
|
|
658
|
+
|
|
659
|
+
### Changed — retired hardcoded in-app choreography constants
|
|
660
|
+
|
|
661
|
+
Per-app tab-order and keystroke constants (e.g. `tabsAfterRecipient`) are
|
|
662
|
+
removed; the pipeline derives sequencing from live accessibility-tree
|
|
663
|
+
inspection instead.
|
|
664
|
+
|
|
665
|
+
## [0.9.9] - 2026-05-24 — security hardening + registry perf
|
|
666
|
+
|
|
667
|
+
### Security — AppleScript backslash escaping + crypto host token (PR #136)
|
|
668
|
+
|
|
669
|
+
From a full triage of the open CodeQL alerts (only 2 were genuine; the
|
|
670
|
+
other 20 were by-design for a local single-user tool and were dismissed
|
|
671
|
+
with justifications):
|
|
672
|
+
|
|
673
|
+
- **AppleScript injection (CodeQL #61–64, HIGH).**
|
|
674
|
+
`buildMacWindowTargetClause` escaped `"` but not `\` before embedding
|
|
675
|
+
`processName`/`title` into an `osascript -e` double-quoted string. `\` is
|
|
676
|
+
an AppleScript escape character and these fields are LLM/screen-supplied,
|
|
677
|
+
so a value containing a backslash could break out of the string literal.
|
|
678
|
+
Now escapes `\` then `"` at all four sites (macOS-only path).
|
|
679
|
+
- **Host-helper token (CodeQL #77, HIGH).** Replaced `Math.random()` (not
|
|
680
|
+
cryptographically secure) with `crypto.randomBytes(24)`, and the
|
|
681
|
+
check-then-write with an exclusive create (`flag: 'wx'`) that reads the
|
|
682
|
+
existing token on `EEXIST` — closing a TOCTOU window.
|
|
683
|
+
|
|
684
|
+
### Performance — memoize the granular tool registry (PR #116)
|
|
685
|
+
|
|
686
|
+
`getTool(name)` resolved via `getAllTools().find(...)` and `getTools()`
|
|
687
|
+
re-spread all 14 `get*Tools()` sources on every call, so every single-tool
|
|
688
|
+
lookup (the dispatch hot path) rebuilt the entire registry. The granular
|
|
689
|
+
definitions are static, so they're now assembled once and cached;
|
|
690
|
+
`getTools()`/`getAllTools()` still return fresh copies (mutation-safe), and
|
|
691
|
+
`getTool()` searches the cache directly. No behavior change.
|
|
692
|
+
|
|
693
|
+
## [0.9.8] - 2026-05-24 — complete the Toolbox + registry metadata + site refresh
|
|
694
|
+
|
|
695
|
+
### Added — smart_* and URI escape hatches reach the compound Toolbox (PR #135)
|
|
696
|
+
|
|
697
|
+
Three useful granular tools were orphaned from the recommended 6-tool
|
|
698
|
+
compound surface; they're now wired in (cross-OS — each underlying tool was
|
|
699
|
+
already cross-platform, this only changes dispatch):
|
|
700
|
+
|
|
701
|
+
- **`accessibility`** gains `smart_click` / `smart_type` / `smart_read` —
|
|
702
|
+
auto-fallback OCR → a11y → CDP by element text, no coordinates.
|
|
703
|
+
- **`system`** gains `open_uri` / `build_uri` / `learn_app` — the URI escape
|
|
704
|
+
hatches (`mailto:` `tel:` `slack:` `vscode:` `spotify:` `file:` …) that
|
|
705
|
+
accomplish an intent without driving UI, plus a guide-write companion to
|
|
706
|
+
`app_guide`. `open_uri` dispatches via macOS `open`, Linux `xdg-open`, and
|
|
707
|
+
Windows registered-handler resolution.
|
|
708
|
+
|
|
709
|
+
Safety: `safety.ts` gains matching `publicCompoundMap` + `TOOL_TIER` entries
|
|
710
|
+
so the new actions gate correctly on the compound path (`open_uri` /
|
|
711
|
+
`learn_app` → destructive, `build_uri` → read), not the `input` default.
|
|
712
|
+
|
|
713
|
+
### Changed — npm registry metadata (PR #132)
|
|
714
|
+
|
|
715
|
+
Added `mcpName: io.github.AmrDab/clawdcursor` (for the official MCP
|
|
716
|
+
registry), refreshed the stale package description to the current
|
|
717
|
+
local-MCP-server / fallback-layer positioning, and added `mcp-server` /
|
|
718
|
+
`gui-automation` keywords.
|
|
719
|
+
|
|
720
|
+
### Changed — website refresh (PR #134)
|
|
721
|
+
|
|
722
|
+
Hero headline restored to "A cursor and a keyboard for any AI agent";
|
|
723
|
+
install section rebuilt as a segmented tab bar (`npm` · Windows · macOS/Linux
|
|
724
|
+
· Source) with npm a first-class option; tool-surface labels aligned to the
|
|
725
|
+
README's Toolbox / Tools naming.
|
|
726
|
+
|
|
727
|
+
### Fixed — CI: mcp-orphan-teardown flake on Windows (PR #133)
|
|
728
|
+
|
|
729
|
+
The test is no longer skipped on Windows (the platform the orphan bug it
|
|
730
|
+
guards lived on). It runs with a 20s exit budget instead of 5s — tolerating
|
|
731
|
+
slow native-module teardown on windows runners while still catching a
|
|
732
|
+
genuine hang. The earlier Node-20-only skip wrongly assumed Node 22 was
|
|
733
|
+
immune.
|
|
734
|
+
|
|
735
|
+
## [0.9.7] - 2026-05-23 — GUI reliability + safety/efficiency tuning + npm install
|
|
736
|
+
|
|
737
|
+
First release published to **npm** — `npm i -g clawdcursor` now works on
|
|
738
|
+
any OS. Bundles the fixes that landed on `main` after v0.9.6.
|
|
739
|
+
|
|
740
|
+
### Fixed — Save As dialog reliability on Windows (PR #128, #122 + #123)
|
|
741
|
+
|
|
742
|
+
- **`set_field_value` on a ComboBox+Edit composite** (e.g. the Save As
|
|
743
|
+
filename field) returned `set_field_value failed for 'undefined'`. Fixed
|
|
744
|
+
with a PS-level inner-Edit-child retry plus a TS keyboard fallback that
|
|
745
|
+
targets the widest-bounds element sharing the name (the input, not the
|
|
746
|
+
label) when ValuePattern is absent (Win11 XAML dialogs).
|
|
747
|
+
- **Clicks could land on a background window** when a dialog sat over
|
|
748
|
+
another window (focus/DPI race). `WindowsAdapter.mouseClick` now calls
|
|
749
|
+
`ensureForegroundAtPoint(x, y)` first — `WindowFromPoint` →
|
|
750
|
+
`GetAncestor(GA_ROOT)`, a no-op fast path when already foreground, else
|
|
751
|
+
the `AttachThreadInput` + `SetForegroundWindow` dance to beat the
|
|
752
|
+
Windows foreground lock.
|
|
753
|
+
- #121 (triple_click in Save As) was reviewed and intentionally **not**
|
|
754
|
+
changed: `mouse_triple_click` is documented as "selects a paragraph",
|
|
755
|
+
so rerouting it to Ctrl+A globally would break that contract elsewhere.
|
|
756
|
+
|
|
757
|
+
### Fixed — safety gate no longer flags typed prose (PR #127, #124)
|
|
758
|
+
|
|
759
|
+
The destructive-label patterns (`\bsend\b`, `\bconfirm\b`, …) are meant
|
|
760
|
+
for the label of a control being *activated* (clicked/invoked), but the
|
|
761
|
+
MCP gate also matched them against the `text` payload of typing tools.
|
|
762
|
+
Typing "…verification to confirm reliable automation" tripped a confirm
|
|
763
|
+
gate. Fixed by skipping the patterns for typing canonical tools
|
|
764
|
+
(`type_text`, `cdp_type`) via a `TYPING_TOOLS` denylist — click/invoke
|
|
765
|
+
label safety (incl. `cdp_click` by visible text) is fully preserved.
|
|
766
|
+
|
|
767
|
+
### Added — explicit token-cost hierarchy in the agent prompt (PR #129)
|
|
768
|
+
|
|
769
|
+
`buildSystemPrompt` (also served to external agents via
|
|
770
|
+
`get_system_prompt`) now states the cost ladder so any agent climbs
|
|
771
|
+
cheap→expensive deliberately: act (click/type/key) < inspect
|
|
772
|
+
(find_element/get_element) < read a11y tree / OCR (read_screen) <
|
|
773
|
+
screenshot. Reinforces "read the attached a11y snapshot before spending
|
|
774
|
+
a screenshot."
|
|
775
|
+
|
|
776
|
+
### Security — qs DoS bump (PR #126)
|
|
777
|
+
|
|
778
|
+
`qs` 6.14.2 → 6.15.2 (transitive via express/supertest) — patches a
|
|
779
|
+
remotely-triggerable `qs.stringify` DoS.
|
|
780
|
+
|
|
781
|
+
### Added — npm install + website/README npm one-liner
|
|
782
|
+
|
|
783
|
+
`clawdcursor` is now published to npm. README Quickstart and the website
|
|
784
|
+
Install section lead with `npm i -g clawdcursor` (with the macOS
|
|
785
|
+
native-helper note); the OS installer scripts remain for the
|
|
786
|
+
clone-build-link path that handles the macOS native build automatically.
|
|
787
|
+
|
|
788
|
+
## [0.9.6] - 2026-05-22 — key_press crash fix + auth-hardening + docs catchup + CI stabilization
|
|
789
|
+
|
|
790
|
+
### Fixed — `key_press` crashed on non-printable keys (PR #125, fixes #120)
|
|
791
|
+
|
|
792
|
+
A live test driving the compact MCP surface end-to-end (Outlook email +
|
|
793
|
+
Paint drawing, tools only) surfaced that `computer.key` /
|
|
794
|
+
`key_press` threw `Cannot read properties of undefined (reading
|
|
795
|
+
'toLowerCase')` on `Backspace`, `Enter`, `Tab`, `Delete`, and `Ctrl+*`
|
|
796
|
+
combos. Root cause: `normalizeKey()` in `src/platform/keys.ts`
|
|
797
|
+
called `.toLowerCase()` on its argument without guarding against
|
|
798
|
+
non-string / empty input, so any code path that reached it with an
|
|
799
|
+
unexpected value crashed instead of degrading gracefully.
|
|
800
|
+
|
|
801
|
+
`normalizeKey()` now validates its input and throws a clear,
|
|
802
|
+
debuggable error (`expected a non-empty string`) instead of a cryptic
|
|
803
|
+
`TypeError`; `native-desktop.ts` guards the parsed-key path the same
|
|
804
|
+
way. The fix sits on the shared `NativeDesktop` path that
|
|
805
|
+
`computer.key` traverses on **all three platforms** (Windows, macOS,
|
|
806
|
+
Linux). Test coverage: 9 cases at
|
|
807
|
+
`src/__tests__/keys-normalization.test.ts` covering valid combos plus
|
|
808
|
+
empty/undefined/non-string inputs. Thanks to first-time contributor
|
|
809
|
+
@xxiaoxiong.
|
|
810
|
+
|
|
811
|
+
### Docs — `Toolbox` / `Tools` naming + restored action-enum tables (PR #111)
|
|
812
|
+
|
|
813
|
+
The repositioning in #93 inadvertently stripped the per-toolbox action
|
|
814
|
+
enum tables that v0.9.3 shipped. Readers landing on the post-v0.9.4
|
|
815
|
+
README saw vague descriptions like *"computer — Mouse, keyboard,
|
|
816
|
+
screenshot. Raw I/O."* with no way to discover the ~70 verbs each
|
|
817
|
+
compound tool actually exposes short of querying `tools/list`. The
|
|
818
|
+
tables are restored verbatim from v0.9.3, and the two sections are
|
|
819
|
+
labeled **`Toolbox` — 6 compound tools (recommended)** and **`Tools`
|
|
820
|
+
— 97 granular primitives** to make the catalog choice unambiguous.
|
|
821
|
+
|
|
822
|
+
### Security — dashboard cookie auth instead of inline-JS token injection
|
|
823
|
+
|
|
824
|
+
The dashboard at `/` no longer injects the bearer token into client
|
|
825
|
+
JS. The previous flow set `var __TOKEN = '__CLAWD_TOKEN_PLACEHOLDER__'`
|
|
826
|
+
in the served HTML so dashboard JS could send `Authorization: Bearer`
|
|
827
|
+
on `/mcp` calls — which meant any future XSS, a malicious browser
|
|
828
|
+
extension, or a host misbind to a non-loopback address could exfiltrate
|
|
829
|
+
the live token and execute the full MCP tool catalog.
|
|
830
|
+
|
|
831
|
+
The server now sets `clawdcursor_token` as a `httpOnly` + `sameSite:
|
|
832
|
+
strict` cookie when serving `/`. Dashboard JS no longer carries the
|
|
833
|
+
token at all; `fetch('/mcp', …)` relies on the browser auto-attaching
|
|
834
|
+
the cookie on same-origin requests. The auth gate at
|
|
835
|
+
`src/surface/http-utility.ts` accepts both `Authorization: Bearer`
|
|
836
|
+
headers (used by external tooling) and the cookie (used by the
|
|
837
|
+
dashboard) — backward-compatible for any script that authenticates by
|
|
838
|
+
header.
|
|
839
|
+
|
|
840
|
+
### Security — `requireAuth` no longer silently accepts on-disk token rotation by default
|
|
841
|
+
|
|
842
|
+
`requireAuth` previously fell back to reading `~/.clawdcursor/token`
|
|
843
|
+
when the incoming token didn't match the in-memory token. That allowed
|
|
844
|
+
any process with write access to that file to rotate the auth token
|
|
845
|
+
and gain MCP access immediately without restarting the daemon.
|
|
846
|
+
|
|
847
|
+
Drift acceptance is now opt-in via `CLAWD_ALLOW_DISK_TOKEN_DRIFT=1`.
|
|
848
|
+
The default is fail-closed: a request whose token doesn't match the
|
|
849
|
+
in-memory token is rejected, regardless of what's on disk.
|
|
850
|
+
|
|
851
|
+
**Backward-incompatible** for any tooling that rotated the disk token
|
|
852
|
+
to authenticate against a running daemon. Set
|
|
853
|
+
`CLAWD_ALLOW_DISK_TOKEN_DRIFT=1` to restore the previous behavior.
|
|
854
|
+
|
|
855
|
+
### CI — global nut-js mock for Linux runners
|
|
856
|
+
|
|
857
|
+
`tests/vitest.setup.ts` wires a global mock for `@nut-tree-fork/nut-js`
|
|
858
|
+
so vitest can boot on Linux CI runners that don't have libXtst /
|
|
859
|
+
libxdo installed. Existing per-file `vi.mock('@nut-tree-fork/nut-js',
|
|
860
|
+
…)` declarations continue to override the global, so no existing
|
|
861
|
+
test behavior changes. Method names in the global mock match
|
|
862
|
+
production usage in `src/platform/native-desktop.ts` (`mouse.click`,
|
|
863
|
+
`screen.grabRegion`, etc.) so the global is a usable fallback for
|
|
864
|
+
new tests.
|
|
865
|
+
|
|
866
|
+
### CI — skip `mcp-orphan-teardown` on Windows + Node 20.x (PR #118)
|
|
867
|
+
|
|
868
|
+
`tests/mcp-orphan-teardown.test.ts` failed intermittently on the
|
|
869
|
+
`windows-latest / Node 20.x` matrix slot — always with `process did
|
|
870
|
+
not exit within 5000ms`, always passing on rerun. Same failure family
|
|
871
|
+
as the existing headless-Linux skip: `clawdcursor mcp` loads heavy
|
|
872
|
+
native modules (nut-js, sharp's libvips, playwright) whose teardown
|
|
873
|
+
doesn't finish within the 5s exit budget on Node 20 specifically
|
|
874
|
+
(Node 22.x tightened process-exit semantics, so the contract holds
|
|
875
|
+
there). The test now skips on Windows + Node 20.x, preserving coverage
|
|
876
|
+
on macOS, Linux-with-display, and Windows + Node 22.x.
|
|
877
|
+
|
|
878
|
+
|
|
879
|
+
## [0.9.5] - 2026-05-21 — repositioning + compact `task` fix + macOS Tahoe silent screenshots + npm publish prep
|
|
880
|
+
|
|
881
|
+
Three threads landed: a documentation reframe so the README finally
|
|
882
|
+
matches what the product actually is, a real ship-bug fix for one of
|
|
883
|
+
the six headline compact tools, and a macOS 26 Tahoe compatibility
|
|
884
|
+
fix. Also: package metadata is now npm-publish-ready.
|
|
885
|
+
|
|
886
|
+
### Added — README + homepage repositioning (PR #93)
|
|
887
|
+
|
|
888
|
+
After v0.9.4's live tests confirmed external LLMs (Sonnet driving the
|
|
889
|
+
compact MCP surface) consistently passed real tasks via the MCP
|
|
890
|
+
catalog, the documentation now leads with that fact instead of the
|
|
891
|
+
"skill, not an app" framing.
|
|
892
|
+
|
|
893
|
+
- Old tagline: *"A cursor and a keyboard for any AI agent on a real desktop."*
|
|
894
|
+
- New tagline: **"The local MCP server that gives any agent safe desktop control."**
|
|
895
|
+
|
|
896
|
+
Above-the-fold opening triplet now names the three defensible
|
|
897
|
+
architectural claims: **no cloud / no telemetry by default**, **single
|
|
898
|
+
`safety.evaluate()` chokepoint** every tool call routes through, and
|
|
899
|
+
**bearer-token auth on every HTTP request**. Homepage (docs/index.html)
|
|
900
|
+
mirrors the README changes.
|
|
901
|
+
|
|
902
|
+
### Fixed — compact `task` compound returns `success: false` on success (PR #110)
|
|
903
|
+
|
|
904
|
+
The compact `task` action — one of the six headline tools — routes
|
|
905
|
+
through `delegate_to_agent`, which polls `agent_status` until idle
|
|
906
|
+
and then reads `data.lastResult` to report `{success, verified, steps,
|
|
907
|
+
lastAction}` to the caller.
|
|
908
|
+
|
|
909
|
+
But `AgentState` had no `lastResult` field (`src/types.ts:80`). After
|
|
910
|
+
`executeTask()` finished, the result was returned to the direct caller
|
|
911
|
+
but never written onto state. The poll-then-read path saw `undefined`
|
|
912
|
+
and reported `{success: false, steps: 0}` on every completed task —
|
|
913
|
+
including the successful ones. One of the six headline tools was
|
|
914
|
+
silently broken in v0.9.4.
|
|
915
|
+
|
|
916
|
+
Fix: `AgentState` now has `lastResult?: TaskResult`. `executeTask()`
|
|
917
|
+
snapshots the result onto `state.lastResult` immediately before
|
|
918
|
+
resolving. Cleared at task start so pollers can't read stale data
|
|
919
|
+
while a new task is in flight. Test coverage: 4 new tests at
|
|
920
|
+
`src/__tests__/agent-last-result.test.ts`.
|
|
921
|
+
|
|
922
|
+
### Fixed — silent screenshots on macOS 14+ via ScreenCaptureKit (PR #109)
|
|
923
|
+
|
|
924
|
+
macOS 26 Tahoe added a "screen captured" white-flash animation that
|
|
925
|
+
fires whenever any process hits the screencapture coordinator daemon —
|
|
926
|
+
including the deprecated `CGWindowListCreateImage` API our
|
|
927
|
+
`ScreenshotHelper` was using. For an agent tool that screenshots
|
|
928
|
+
dozens of times per session, every flash was both visually disruptive
|
|
929
|
+
and a privacy signal users didn't need to see for legitimate
|
|
930
|
+
automation.
|
|
931
|
+
|
|
932
|
+
New `captureFullScreenSCK` + `captureWindowSCK` functions use
|
|
933
|
+
ScreenCaptureKit (macOS 14+) which Tahoe's flash hook does NOT
|
|
934
|
+
intercept. JSON output shape preserved byte-for-byte; deployment
|
|
935
|
+
target stays `.macOS(.v12)` via runtime version gate. Falls back to
|
|
936
|
+
the existing CG path on macOS 12-13 where CG is still silent.
|
|
937
|
+
|
|
938
|
+
### Added — `prepare` script for clean npm publish
|
|
939
|
+
|
|
940
|
+
`package.json` now has `prepare: tsc && node dist/postbuild.js`. The
|
|
941
|
+
npm `prepare` lifecycle runs on `npm pack` / `npm publish`, so the
|
|
942
|
+
published tarball always reflects the current source rather than
|
|
943
|
+
shipping a stale `dist/` from the developer's last `npm run build`.
|
|
944
|
+
|
|
945
|
+
### Fixed — installer no longer destroys user state on dirty tree (PR #108, backfilled to v0.9.5)
|
|
946
|
+
|
|
947
|
+
The `irm https://clawdcursor.com/install.ps1 | iex` and equivalent
|
|
948
|
+
`curl … | bash` paths previously did a `git checkout && git pull` and,
|
|
949
|
+
on any non-zero exit, ran `rm -rf $INSTALL_DIR` and re-cloned from
|
|
950
|
+
scratch. Any uncommitted work in the user's tree — feature branches,
|
|
951
|
+
dirty edits, untracked scratch files — was destroyed with no consent
|
|
952
|
+
and no recovery path. The error message also lied about the cause: a
|
|
953
|
+
dirty tree, a missing ref, or a diverged branch all surfaced as
|
|
954
|
+
"Download failed. Check your internet and try again."
|
|
955
|
+
|
|
956
|
+
Both installers now refuse to update a dirty tree, surface the real
|
|
957
|
+
`git` stderr on failure, and never delete `$INSTALL_DIR` without
|
|
958
|
+
explicit user action. `install.ps1` also dropped UTF-8 em-dashes in
|
|
959
|
+
comments to fix a Windows-PowerShell-5.1 ANSI-decoding parser issue.
|
|
960
|
+
|
|
961
|
+
### Notes
|
|
962
|
+
|
|
963
|
+
- **macOS users installing via `npm i -g clawdcursor`**: the Swift
|
|
964
|
+
native helper (ClawdCursor.app) isn't pre-built in the npm tarball.
|
|
965
|
+
After install, run `cd $(npm root -g)/clawdcursor && bash native/build.sh && clawdcursor grant`
|
|
966
|
+
to build it. Or use the existing `irm | iex` installer which handles
|
|
967
|
+
this automatically. Fixing the npm-direct macOS path is on the
|
|
968
|
+
v0.9.6 list.
|
|
969
|
+
- Closed PR #94 (diagram improvements) — its scope was a subset of
|
|
970
|
+
#93's; the diagram updates folded in via the rebase.
|
|
971
|
+
|
|
972
|
+
|
|
973
|
+
## [0.9.4] - 2026-05-20 — external-agent reliability + browser DOM reachability
|
|
974
|
+
|
|
975
|
+
Two threads of work landed: a batch of reliability fixes surfaced by
|
|
976
|
+
an end-to-end live test (Sonnet driving clawdcursor over MCP-HTTP
|
|
977
|
+
against the public benchmark exam at clawdcursor.com/tests), and the
|
|
978
|
+
first round of fixes to the external-agent UX gap that test exposed.
|
|
979
|
+
|
|
980
|
+
### Live test summary
|
|
981
|
+
|
|
982
|
+
The exam at `192.168.1.127:8000` (14 desktop-control tasks: clicks,
|
|
983
|
+
drags, hover, double/right-click, typing, scroll-to-find, bezier path,
|
|
984
|
+
keyboard combo, multi-step workflow) was passed end-to-end by Sonnet
|
|
985
|
+
driving the compact MCP surface. Three runs:
|
|
986
|
+
|
|
987
|
+
- baseline (no hierarchy prompt): grade A, 39 screenshots, 2 a11y calls
|
|
988
|
+
- hierarchy prompted (no CDP fallback yet): grade A, 39 screenshots, 0 a11y successes — proved the underlying tools were canvas-blind
|
|
989
|
+
- post-CDP-fallback + `--compact`: ~20 CDP DOM hits including ★TARGET in the scroll-to-find task (saved ~285 wheel-scroll calls)
|
|
990
|
+
|
|
991
|
+
### Added — `clawdcursor agent --compact` (PR #106)
|
|
992
|
+
|
|
993
|
+
Previously the 6-compound MCP surface (`computer`, `accessibility`,
|
|
994
|
+
`window`, `system`, `browser`, `task`) was only reachable via
|
|
995
|
+
`clawdcursor mcp --compact` (stdio, for editor integrations). The
|
|
996
|
+
HTTP-MCP daemon at `:3847/mcp` was hard-coded to serve all 97 granular
|
|
997
|
+
tools — which silently broke the README's "6 compact tools" pitch for
|
|
998
|
+
any external agent connecting over HTTP. `clawdcursor agent --compact`
|
|
999
|
+
(or `CLAWD_MCP_COMPACT=1`) now exposes the same compound surface over
|
|
1000
|
+
HTTP. Default stays granular because the daemon dashboard at `/` calls
|
|
1001
|
+
9 granular tool names directly (`scheduled_task_*`, `agent_status`,
|
|
1002
|
+
`submit_task`, `favorites_*`, `logs_recent`) — flipping the default
|
|
1003
|
+
will follow once those calls migrate to the compound `system` action
|
|
1004
|
+
vocabulary.
|
|
1005
|
+
|
|
1006
|
+
### Added — CDP DOM fallback in `find_element` + `read_screen` (PR #107)
|
|
1007
|
+
|
|
1008
|
+
Edge / Chrome UIA trees stop at browser chrome — single-page apps and
|
|
1009
|
+
in-page DOM widgets are invisible to pure UIA queries. When the focused
|
|
1010
|
+
window is a recognised browser and clawdcursor's CDP driver is
|
|
1011
|
+
connected, `find_element` and `read_screen` now also query the DOM via
|
|
1012
|
+
`document.querySelectorAll('a, button, input, …, [aria-label], [role]')`
|
|
1013
|
+
and fold the matches into the response. `find_element` flags CDP
|
|
1014
|
+
results with a `(via CDP DOM; coords are viewport-relative)` header;
|
|
1015
|
+
`read_screen` appends a `BROWSER DOM` section side-by-side with the
|
|
1016
|
+
UIA tree. The smart-layer (`smart_click` / `smart_read` / `smart_type`)
|
|
1017
|
+
already had this fallback; the granular tools that external agents
|
|
1018
|
+
prefer when explicitly told "a11y first" did not. Now they do.
|
|
1019
|
+
|
|
1020
|
+
**Known limit.** CDP DOM only sees standard HTML elements. Canvas-
|
|
1021
|
+
rendered content (shapes drawn via 2D context or WebGL) remains
|
|
1022
|
+
vision-only and requires `computer.screenshot` + pixel coordinates.
|
|
1023
|
+
This is a platform limit, not a tool limit — `querySelectorAll` cannot
|
|
1024
|
+
enumerate pixels.
|
|
1025
|
+
|
|
1026
|
+
### Fixed — pipeline ladder climbs past rung LLM errors (PR #104)
|
|
1027
|
+
|
|
1028
|
+
`src/core/pipeline.ts` previously treated any "aborted" failure string
|
|
1029
|
+
as a hard user-abort, so a transient LLM timeout on the blind rung
|
|
1030
|
+
collapsed the whole chain — vision was effectively dead code on slow
|
|
1031
|
+
or flaky providers. Replaced the stringly-typed branch with a
|
|
1032
|
+
`RungFailureCategory` tagged-union (`user_abort` / `rung_llm_error` /
|
|
1033
|
+
`agent_gave_up` / `verifier_rejected` / `config_missing` /
|
|
1034
|
+
`anti_pattern` / `infra_error`) and a `categorizeFailureReason` mapper
|
|
1035
|
+
as the single source of truth. Chain-abort gate hard-aborts only on
|
|
1036
|
+
`user_abort`, `infra_error`, `anti_pattern`, or high-confidence
|
|
1037
|
+
`verifier_rejected`; everything else escalates to the next rung.
|
|
1038
|
+
|
|
1039
|
+
Verified live: pointing the daemon at an unreachable LLM URL produced
|
|
1040
|
+
`blind → hybrid → vision` rung attempts where the previous chain-abort
|
|
1041
|
+
gate stopped after rung 1. Also fixed a related phantom-success bug
|
|
1042
|
+
where aggregate accounting could mark a task `success: true` when every
|
|
1043
|
+
rung had failed with `rung_llm_error`. 4 integration tests +
|
|
1044
|
+
7 mapper unit tests added at `src/__tests__/pipeline-chain-abort.test.ts`.
|
|
1045
|
+
|
|
1046
|
+
### Fixed — blind-mode coordinate-click guardrail (PR #103)
|
|
1047
|
+
|
|
1048
|
+
The autonomous agent's blind rung (a11y-only, no screenshots) was
|
|
1049
|
+
emitting raw `mouse_click(x, y)` calls with hallucinated coordinates
|
|
1050
|
+
when the a11y tree didn't contain the LLM's target — a live test
|
|
1051
|
+
observed it walking through an exam UI by guessing positions until the
|
|
1052
|
+
verifier's 0.65-confidence rejection finally fired. New block at
|
|
1053
|
+
`src/core/agent-loop/agent.ts:531-587`: when `mode === 'blind'` and no
|
|
1054
|
+
a11y-aware selector (`invoke_element`, `set_field_value`,
|
|
1055
|
+
`focus_element`, `a11y_select`, `a11y_toggle`, `a11y_expand`,
|
|
1056
|
+
`a11y_collapse`, `wait_for_element`, `find_element`) succeeded in the
|
|
1057
|
+
prior 2 turns, raw coordinate clicks are refused with a structured
|
|
1058
|
+
tool-result that points the LLM at the recovery options
|
|
1059
|
+
(`cannot_read` or `screenshot`). 4 regression tests at
|
|
1060
|
+
`src/__tests__/blind-coord-click-guard.test.ts`.
|
|
1061
|
+
|
|
1062
|
+
### Fixed — CLI `--text-model` / `--api-key` / `--base-url` ignored (PR #105)
|
|
1063
|
+
|
|
1064
|
+
The boot banner read these flags through `resolveConfig`
|
|
1065
|
+
(`src/llm/config.ts:203`) and proudly printed
|
|
1066
|
+
`Using externally configured models: text=X`, but the runtime agent
|
|
1067
|
+
loop read from `loadPipelineConfig` (`src/surface/doctor.ts:1636`)
|
|
1068
|
+
which only consulted `.clawdcursor-config.json` — so the very next log
|
|
1069
|
+
line was `pipeline.start … models=text=off`. `loadPipelineConfig` now
|
|
1070
|
+
accepts an optional `ResolvedConfig` overlay; fields tagged
|
|
1071
|
+
`source === 'cli'` override disk values. Precedence preserved
|
|
1072
|
+
(CLI > project > user > env > autodetect > default). The contradictory
|
|
1073
|
+
double banner (`No AI providers found` immediately followed by
|
|
1074
|
+
`Using externally configured models`) is also gone — the
|
|
1075
|
+
auto-detection branch is skipped when CLI flags already supply LLM
|
|
1076
|
+
wiring. 5 regression tests at
|
|
1077
|
+
`src/__tests__/load-pipeline-config-overlay.test.ts`.
|
|
1078
|
+
|
|
1079
|
+
### Fixed — `smart_click` candidates + macOS multi-window + open_url tier + a11y description fallback (PR #102, closes #101)
|
|
1080
|
+
|
|
1081
|
+
Four issues from issue #101:
|
|
1082
|
+
|
|
1083
|
+
- `smart_click` now returns a structured failure payload
|
|
1084
|
+
`{error, reason, target, candidates, tried, elapsedMs, isError: true}`
|
|
1085
|
+
instead of bare timeout strings. Callers that hit an ambiguous target
|
|
1086
|
+
can disambiguate from the candidate list; deadline-aware budget
|
|
1087
|
+
replaces the bare `Promise.race` that previously swallowed diagnostic
|
|
1088
|
+
state. New tests at `src/__tests__/smart-tools.test.ts`.
|
|
1089
|
+
|
|
1090
|
+
- macOS `focus_window` now disambiguates among multiple windows of the
|
|
1091
|
+
same process by title — `scripts/mac/_window-picker.jxa` plus a
|
|
1092
|
+
`scoreWindow()` heuristic that deprioritises tray-style popovers
|
|
1093
|
+
(Xcode "Downloads", etc.).
|
|
1094
|
+
|
|
1095
|
+
- `open_url` was filtered out of the act-only safety tier; the
|
|
1096
|
+
`safetyTier: 2 → 1` change in `src/tools/extras.ts:523` restores it.
|
|
1097
|
+
|
|
1098
|
+
- A11y element labels now fall back through `name → description →
|
|
1099
|
+
value → ''` so macOS apps that put their visible text in
|
|
1100
|
+
`AXDescription` (Xcode, others) render with something meaningful
|
|
1101
|
+
instead of `"missing value"`. `formatElement()` helper in
|
|
1102
|
+
`src/tools/a11y.ts:25-30`.
|
|
1103
|
+
|
|
1104
|
+
### Repo hygiene
|
|
1105
|
+
|
|
1106
|
+
Closed security-audit issue #13 with the per-commit fix-mapping comment.
|
|
1107
|
+
Rejected SafeSkill scanner PR #92 (the 20/100 "Blocked" badge was
|
|
1108
|
+
based on a heuristic that flags ANSI terminal color escapes as
|
|
1109
|
+
obfuscated content — see `src/surface/cli.ts`, `src/surface/doctor.ts`,
|
|
1110
|
+
etc. for the 58+ legitimate ANSI escapes). Closed issue #101.
|
|
1111
|
+
|
|
1112
|
+
Five dependabot bumps landed: `tsx` 4.21→4.22, `ws` 8.20.0→8.20.1
|
|
1113
|
+
(security patch), `croner` 9→10 (major, breaking change does not
|
|
1114
|
+
affect this codebase — only `?` wildcard semantics changed),
|
|
1115
|
+
`eslint` group +3 updates, `@types/node` 25.7→25.9.
|
|
1116
|
+
|
|
1117
|
+
|
|
1118
|
+
## [0.9.3] - 2026-05-16 — tool-layer fixes + live-test report
|
|
1119
|
+
|
|
1120
|
+
Three critical tool-layer fixes surfaced by a deep audit + a Windows
|
|
1121
|
+
encoding bug spotted during an end-to-end live test (run by an LLM
|
|
1122
|
+
driving the compact MCP surface from Claude Code). Also: README hero
|
|
1123
|
+
no longer leads with "fallback only" framing — that discipline stays
|
|
1124
|
+
in SKILL.md (where it belongs for AI agents) and in a new "When NOT
|
|
1125
|
+
to use it" section in the README body.
|
|
1126
|
+
|
|
1127
|
+
### Fixed — Linux SIGSEGV on MCP stdin teardown (carried from `3fc76b8`)
|
|
1128
|
+
|
|
1129
|
+
Calling `process.exit()` synchronously inside a stdin `'end'` event
|
|
1130
|
+
handler segfaulted on Linux because libuv was still unwinding the
|
|
1131
|
+
stream read handle. `releaseMcp` now guards against double-fire and
|
|
1132
|
+
defers exit via `setImmediate`. Fixes the cross-platform CI on
|
|
1133
|
+
ubuntu-latest (Node 20 + 22).
|
|
1134
|
+
|
|
1135
|
+
### Fixed — `navigate_browser` PowerShell shell injection (Win32 branch)
|
|
1136
|
+
|
|
1137
|
+
`src/tools/orchestration.ts` interpolated the URL into a
|
|
1138
|
+
`Start-Process … -ArgumentList @(…,"${url}")` PowerShell command. A URL
|
|
1139
|
+
containing `")` or `$()` or backticks could escape the quoting and
|
|
1140
|
+
execute arbitrary PowerShell. Replaced with a direct `execFile()`
|
|
1141
|
+
against `msedge.exe` resolved from standard install locations — no
|
|
1142
|
+
shell shim, argv is safe. macOS and Linux branches already used
|
|
1143
|
+
argv-form `execFile` and were not affected.
|
|
1144
|
+
|
|
1145
|
+
### Fixed — `screenshot_full` MIME type lied
|
|
1146
|
+
|
|
1147
|
+
`src/tools/agent.ts` declared `mimeType: 'image/png'` and described the
|
|
1148
|
+
output as base64 PNG, but `captureForLLM()` returns JPEG by default
|
|
1149
|
+
(or PNG only when `CLAWD_SCREENSHOT_FORMAT=png`). Any client that
|
|
1150
|
+
decoded the bytes by the advertised type silently corrupted. The
|
|
1151
|
+
`image.mimeType` field now follows the actual `frame.format`; a new
|
|
1152
|
+
`format` field in the metadata block lets clients double-check.
|
|
1153
|
+
|
|
1154
|
+
### Fixed — `learn_app` silent no-op
|
|
1155
|
+
|
|
1156
|
+
The handler returned `{saved: true}` even when neither save branch
|
|
1157
|
+
executed (e.g., the caller supplied only `processName`). Now tracks
|
|
1158
|
+
`wroteLesson`/`wroteGuide` flags and returns
|
|
1159
|
+
`{saved: false, reason: …, isError: true}` when nothing was persisted.
|
|
1160
|
+
New regression-guard test at `agent-tools.test.ts`.
|
|
1161
|
+
|
|
1162
|
+
### Fixed — Windows window-title UTF-8 corruption
|
|
1163
|
+
|
|
1164
|
+
Confirmed live: every `window.list`/`window.active` call returned
|
|
1165
|
+
non-ASCII characters in window titles as `?` or `�` (the Unicode
|
|
1166
|
+
replacement character). Root cause: `scripts/ps-bridge.ps1` and
|
|
1167
|
+
`scripts/ocr-recognize.ps1` did not set `[Console]::OutputEncoding`,
|
|
1168
|
+
so PowerShell wrote in the system code page (Windows-1252 in most
|
|
1169
|
+
locales) while Node decoded as UTF-8. Both scripts now force UTF-8 on
|
|
1170
|
+
stdin/stdout and `$OutputEncoding`. Same fix benefits OCR text capture
|
|
1171
|
+
of non-ASCII content (emoji, accented characters, CJK).
|
|
1172
|
+
|
|
1173
|
+
### Fixed — compact `direction` enum dropped `scroll_horizontal` values
|
|
1174
|
+
|
|
1175
|
+
`buildCompoundSchema` in `src/tools/compact.ts` was first-wins on
|
|
1176
|
+
field names across delegates: `mouse_scroll` declared
|
|
1177
|
+
`direction: ['up','down']` first and won, so `mouse_scroll_horizontal`'s
|
|
1178
|
+
`['left','right']` was silently invisible on the compact surface. An
|
|
1179
|
+
LLM calling `computer({action:'scroll_horizontal', direction:'left'})`
|
|
1180
|
+
was violating the published schema. The merge now unions enum values
|
|
1181
|
+
across delegates.
|
|
1182
|
+
|
|
1183
|
+
### Improved — `task` and `delegate_to_agent` descriptions lead with the daemon requirement
|
|
1184
|
+
|
|
1185
|
+
Both tools return ECONNREFUSED (or "no agent") when called from a
|
|
1186
|
+
stdio MCP host (Cursor, Claude Code, Windsurf) because they HTTP-call
|
|
1187
|
+
`127.0.0.1:3847/mcp` on the daemon. Their descriptions now lead with
|
|
1188
|
+
**Requires the `clawdcursor agent` daemon to be running** and tell
|
|
1189
|
+
the consumer how to start it.
|
|
1190
|
+
|
|
1191
|
+
### Repositioned — README hero
|
|
1192
|
+
|
|
1193
|
+
The "Use as a fallback, not first choice" callout no longer sits in
|
|
1194
|
+
the README hero. The same discipline stays in SKILL.md (the AI-facing
|
|
1195
|
+
manual, where it correctly disciplines agent behavior) and in a new
|
|
1196
|
+
"When NOT to use it" subsection inside README's `Why Clawd Cursor`
|
|
1197
|
+
block. The hero now leads with what it does. SKILL.md frontmatter is
|
|
1198
|
+
unchanged — it still leads with the strict 4-gate for agent
|
|
1199
|
+
consumers.
|
|
1200
|
+
|
|
1201
|
+
### Added — live test report
|
|
1202
|
+
|
|
1203
|
+
`docs/internal/0.9.2-live-test-2026-05-16.md` documents a full
|
|
1204
|
+
end-to-end test of clawdcursor 0.9.2, run by an LLM consuming the
|
|
1205
|
+
compact MCP surface from Claude Code. Covers every compact compound,
|
|
1206
|
+
the HTTP MCP transport via a parallel daemon, what worked, what
|
|
1207
|
+
surprised, what's broken. Reference artifact for the trust story —
|
|
1208
|
+
something a curious visitor can read to see "yes, this has been
|
|
1209
|
+
actually tested by an AI agent driving a real desktop."
|
|
1210
|
+
|
|
1211
|
+
### Internal — security audit reply draft
|
|
1212
|
+
|
|
1213
|
+
`docs/internal/issue-13-reply-draft.md` is a draft response to the
|
|
1214
|
+
long-open security audit issue, listing what has landed in 0.9.x to
|
|
1215
|
+
address each item. Maintainer reviews + edits + posts to GitHub.
|
|
1216
|
+
|
|
1217
|
+
### Test coverage
|
|
1218
|
+
|
|
1219
|
+
51 test files, 813 tests pass (was 812 — `+1` new regression guard for
|
|
1220
|
+
`learn_app`'s no-payload case). Typecheck clean. Lint stable at 18
|
|
1221
|
+
pre-existing warnings.
|
|
1222
|
+
|
|
1223
|
+
## [0.9.2] - 2026-05-15 — reliability + scanner-friendliness
|
|
1224
|
+
|
|
1225
|
+
Multiple fixes and a refactor consolidated into one release.
|
|
1226
|
+
|
|
1227
|
+
### Fixed — recycled-PID false positives in single-instance lock
|
|
1228
|
+
|
|
1229
|
+
User-reported on Windows 11 + Claude Code: `/mcp` reconnect
|
|
1230
|
+
intermittently failed with `Failed to reconnect to clawdcursor: -32000`,
|
|
1231
|
+
and once it broke, every subsequent reconnect failed too — until the
|
|
1232
|
+
user manually killed zombie node processes and
|
|
1233
|
+
`rm ~/.clawdcursor/mcp.pid`.
|
|
1234
|
+
|
|
1235
|
+
`isProcessAlive(pid)` used `process.kill(pid, 0)`, which on Windows is
|
|
1236
|
+
fooled by PID recycling: once the dead clawdcursor's PID was reassigned
|
|
1237
|
+
to any other live process (chrome, svchost, anything), the lockfile
|
|
1238
|
+
permanently looked "live" and refused all future spawns. The lockfile
|
|
1239
|
+
also stored only a bare integer PID, leaving no way to disambiguate.
|
|
1240
|
+
|
|
1241
|
+
`~/.clawdcursor/{start,mcp,serve}.pid` is now JSON with schema version,
|
|
1242
|
+
PID, **process start time**, and mode. `claimPidFile` requires the
|
|
1243
|
+
recorded start time to match the OS-reported start time of the live PID
|
|
1244
|
+
(±5 s tolerance for OS reporting jitter) before treating it as a real
|
|
1245
|
+
duplicate. Implementation extracted to `src/surface/pidfile.ts` with
|
|
1246
|
+
unit-test coverage. Legacy bare-integer lockfiles are treated as stale
|
|
1247
|
+
on first read (silent backwards-compat — the old format can't be
|
|
1248
|
+
trusted anyway).
|
|
1249
|
+
|
|
1250
|
+
### Fixed — orphan MCP processes block reconnect
|
|
1251
|
+
|
|
1252
|
+
When an editor host exited without reaping its `clawdcursor mcp` child,
|
|
1253
|
+
the orphan kept running with no usable stdio but legitimately matched
|
|
1254
|
+
the lockfile. The `mcp` command now treats stdin EOF / close / error as
|
|
1255
|
+
a hard exit signal: when the parent's stdio pipe closes, the orphan
|
|
1256
|
+
releases its lockfile and exits cleanly. Deterministic on every
|
|
1257
|
+
platform — no polling, no parent-PID inspection.
|
|
1258
|
+
|
|
1259
|
+
### Fixed — `clawdcursor uninstall` silently failed to kill running processes
|
|
1260
|
+
|
|
1261
|
+
The uninstall command's pidfile fallback (`src/surface/cli.ts`) still
|
|
1262
|
+
parsed the lockfile with `parseInt`, which against the new JSON format
|
|
1263
|
+
(`{"v":1,...}`) returns `NaN`, silently skipping the kill. A user
|
|
1264
|
+
running `clawdcursor uninstall` while a clawdcursor process was alive
|
|
1265
|
+
would end up with deleted config + orphaned process. Now uses the
|
|
1266
|
+
shared `readPidLoose` helper that handles both new JSON and legacy
|
|
1267
|
+
bare-int formats.
|
|
1268
|
+
|
|
1269
|
+
### Fixed — dashboard credential redaction silently broken since 0.7.x
|
|
1270
|
+
|
|
1271
|
+
`looksLikeCredential` in `src/surface/dashboard.ts` is supposed to
|
|
1272
|
+
hide password-shaped strings (`password: secret`, `Bearer xxxx`, etc.)
|
|
1273
|
+
from the task-history UI. The patterns were declared inside an outer JS
|
|
1274
|
+
template literal, so the single backslashes in `\s` and `\S` were
|
|
1275
|
+
silently dropped at parse time — the runtime regex matched literal `s`
|
|
1276
|
+
and `S` characters instead of whitespace. **No password the regex was
|
|
1277
|
+
designed to catch was actually being caught.** Patterns now use `\\s` /
|
|
1278
|
+
`\\S` in source so the emitted JS gets the correct escapes; verified
|
|
1279
|
+
end-to-end with a runtime regex eval.
|
|
1280
|
+
|
|
1281
|
+
### Refactor — migrate ANSI escape codes to picocolors
|
|
1282
|
+
|
|
1283
|
+
Replaced 58 inline `\x1b[NNm` ANSI styling literals across
|
|
1284
|
+
`src/surface/{cli,doctor,onboarding,readiness}.ts` and
|
|
1285
|
+
`src/core/observability/logger.ts` with `picocolors` calls. Same visual
|
|
1286
|
+
output (picocolors emits the same standard ANSI codes at runtime, with
|
|
1287
|
+
semantic close codes — `[22m` for bold-off, `[39m` for color-default —
|
|
1288
|
+
instead of heavy-handed `[0m` everywhere, which actually composes
|
|
1289
|
+
better when colors nest).
|
|
1290
|
+
|
|
1291
|
+
Motivation: third-party static analyzers (SafeSkill etc.) flagged
|
|
1292
|
+
inline `\x1b` hex escapes as "potentially obfuscated content" — a
|
|
1293
|
+
malware-detection heuristic that doesn't account for the fact that any
|
|
1294
|
+
CLI with colored output uses exactly that syntax. Routing through
|
|
1295
|
+
picocolors moves the escape codes into a vetted dependency, so source
|
|
1296
|
+
scanners no longer see them as suspicious literals. Added
|
|
1297
|
+
`picocolors@^1.1.1` (zero-deps, ~3 KB).
|
|
1298
|
+
|
|
1299
|
+
The logger's `C` color table is now keyed to picocolors style
|
|
1300
|
+
functions instead of raw escape strings; `colorize`, `layerTag`,
|
|
1301
|
+
`mapStrategyTag` updated accordingly. The ANSI-stripping regex in
|
|
1302
|
+
`pad()` is built from `String.fromCharCode(27)` instead of `\x1b` so
|
|
1303
|
+
the source itself carries no hex escape.
|
|
1304
|
+
|
|
1305
|
+
Platform-layer control-char sanitization regexes (`/[\r\n\t\x00-\x1f]/`)
|
|
1306
|
+
in `src/platform/*.ts` are intentionally **not** migrated — those are
|
|
1307
|
+
input filters, not styling, and aren't what static analyzers were
|
|
1308
|
+
flagging as critical.
|
|
1309
|
+
|
|
1310
|
+
### Docs — SKILL.md frontmatter leads with FALLBACK ONLY
|
|
1311
|
+
|
|
1312
|
+
The frontmatter `description` field — what skill registries and AI
|
|
1313
|
+
tool indexes display before an agent opens the file — now leads with
|
|
1314
|
+
"FALLBACK ONLY" + the explicit numbered 4-gate (native API → CLI →
|
|
1315
|
+
file edit → existing browser automation), instead of the softer "skill
|
|
1316
|
+
of last resort that gives AI agents eyes…" wording that front-loaded
|
|
1317
|
+
the capability claim. The body content already had the same 4-gate
|
|
1318
|
+
(lines 46–54 and 197–208); this aligns the frontmatter with that body
|
|
1319
|
+
messaging. PR #95.
|
|
1320
|
+
|
|
1321
|
+
### Internal — release-time version sync
|
|
1322
|
+
|
|
1323
|
+
`scripts/sync-version.ts` reads `package.json` at release time and
|
|
1324
|
+
propagates the version into `SKILL.md` frontmatter, `docs/index.html`
|
|
1325
|
+
hero/footer, and the install script header pins. Wired into npm's
|
|
1326
|
+
`version` lifecycle hook so `npm version <bump>` updates everything
|
|
1327
|
+
in one shot. Removes drift opportunity between `package.json` and the
|
|
1328
|
+
website / SKILL frontmatter that previously had to be hand-synced.
|
|
1329
|
+
|
|
1330
|
+
### Internal — tool-count cleanup
|
|
1331
|
+
|
|
1332
|
+
User-visible runtime output and the marketing site previously claimed
|
|
1333
|
+
89 or 93 tools in places where the actual catalog was 97. `doctor.ts`
|
|
1334
|
+
post-success panel and `docs/index.html` hero/spec/mode-stats now match
|
|
1335
|
+
the registry. Historical "What's new" entries (e.g. v0.9.0's "89
|
|
1336
|
+
granular + 6 compact") are left as-is — they're accurate to the
|
|
1337
|
+
release they describe.
|
|
1338
|
+
|
|
1339
|
+
### Migration
|
|
1340
|
+
|
|
1341
|
+
No action needed for fresh installs. A user already on a broken
|
|
1342
|
+
PID-lock state should update, then a single `rm ~/.clawdcursor/mcp.pid`
|
|
1343
|
+
(or `clawdcursor stop`) clears the legacy lockfile the prior version
|
|
1344
|
+
left behind. From then on the new code self-heals.
|
|
1345
|
+
|
|
1346
|
+
## [0.9.1] - 2026-05-14 — compose-send fix + scheduled tasks
|
|
1347
|
+
|
|
1348
|
+
A user-reported regression on macOS plus a long-missing daemon feature. No
|
|
1349
|
+
breaking changes; safe upgrade from v0.9.0.
|
|
1350
|
+
|
|
1351
|
+
### Fixed — compose-send playbook (real user-reported bug)
|
|
1352
|
+
|
|
1353
|
+
A v0.9.0 user on macOS asked "open mail app and send an email to X
|
|
1354
|
+
introducing yourself." The trace reported `✅ done · path=playbook · 2/2
|
|
1355
|
+
subtasks · $0.0000`, but the actual send was broken: the body landed in
|
|
1356
|
+
the wrong field (and/or merged with the subject field). **No LLM was
|
|
1357
|
+
called and no vision fallback ever fired** — the bug was 100% in the
|
|
1358
|
+
deterministic playbook plus a verifier bypass that let the playbook
|
|
1359
|
+
self-certify. Three layered fixes:
|
|
1360
|
+
|
|
1361
|
+
- **Platform-aware Tab count after recipient** in
|
|
1362
|
+
`src/tools/playbooks/compose-send.ts`. The previous code fired TWO Tabs
|
|
1363
|
+
after typing the recipient, assuming every mail app shows Cc/Bcc inline.
|
|
1364
|
+
macOS Mail.app's default layout has Cc/Bcc collapsed — Tab order is
|
|
1365
|
+
`To → Subject → Body`. Two Tabs overshot Subject and landed on Body.
|
|
1366
|
+
New: 1 Tab on darwin/linux, 3 Tabs on win32 (Outlook desktop default),
|
|
1367
|
+
via a `tabsAfterRecipient()` helper. Documented per-platform in the
|
|
1368
|
+
module header.
|
|
1369
|
+
- **Decoupled the post-subject Tab from `if (subject)`**. The advance to
|
|
1370
|
+
Body now fires unconditionally so a task with no explicit subject (the
|
|
1371
|
+
user's "introducing yourself" case) still lands the body in the right
|
|
1372
|
+
field instead of typing it into whatever the previous Tab happened to
|
|
1373
|
+
leave focus on.
|
|
1374
|
+
- **Removed playbook exemption from the verifier** in
|
|
1375
|
+
`src/core/pipeline.ts:649-655`. The router exemption stays (router has
|
|
1376
|
+
its own window-list-diff evidence). Playbooks now go through the
|
|
1377
|
+
ground-truth verifier like every other rung — the rich `send_email`
|
|
1378
|
+
task assertions (`compose_closed` via full window list, `recipient_visible`,
|
|
1379
|
+
`not_just_saved_as_draft` anti-signal) were designed for exactly this
|
|
1380
|
+
bug class but couldn't catch it because they never ran. Verifier is
|
|
1381
|
+
<500ms; soft-fail-on-low-confidence policy stays in place for legitimate
|
|
1382
|
+
idempotent operations.
|
|
1383
|
+
- **Better summary line**: `compose-send: to=… subject=… body=…ch
|
|
1384
|
+
tabs-after-to=…` now reports parsed field state and platform Tab count
|
|
1385
|
+
in the trailing PIPELINE_DONE line. Empty subject was the original
|
|
1386
|
+
diagnostic signal in the user-reported bug — now it's visible at a
|
|
1387
|
+
glance.
|
|
1388
|
+
|
|
1389
|
+
### Added — Scheduled tasks (new feature, requested)
|
|
1390
|
+
|
|
1391
|
+
Cron-driven recurring tasks that fire through the same agent pipeline as
|
|
1392
|
+
`submit_task`. Persisted across daemon restarts. **Dashboard gets a new
|
|
1393
|
+
⏰ Scheduled tab** with cron + task inputs, an active-schedule list, and
|
|
1394
|
+
per-row pause / delete buttons.
|
|
1395
|
+
|
|
1396
|
+
- **`src/tools/scheduler.ts`** — 4 new MCP tools:
|
|
1397
|
+
- `scheduled_task_create({ task, cron, tz? })` — validates the cron up
|
|
1398
|
+
front (`croner`), persists, registers an in-process cron job that
|
|
1399
|
+
dispatches via `agent.executeTask`.
|
|
1400
|
+
- `scheduled_task_list()` — returns every persisted task with run /
|
|
1401
|
+
skip / lastError counters and a computed `nextRun` ISO timestamp.
|
|
1402
|
+
- `scheduled_task_delete({ id })` — unregisters + removes from disk.
|
|
1403
|
+
- `scheduled_task_toggle({ id, enabled })` — pause/resume without
|
|
1404
|
+
deleting; disabled tasks stay persisted but their cron job is
|
|
1405
|
+
unregistered.
|
|
1406
|
+
- **Storage**: `~/.clawdcursor/scheduled-tasks.json`. Path is computed
|
|
1407
|
+
dynamically (honors `CLAWD_HOME`) so tests and forks can redirect.
|
|
1408
|
+
- **Reentrancy**: if a tick fires while the agent is busy, the task is
|
|
1409
|
+
skipped and `skipCount` increments. No queue, no pile-up. Predictable.
|
|
1410
|
+
- **Boot lifecycle**: `clawdcursor agent` calls `initScheduler(agent)` on
|
|
1411
|
+
startup (only when an LLM is configured — the scheduler requires the
|
|
1412
|
+
autonomous agent to dispatch into). Daemon shutdown calls
|
|
1413
|
+
`stopScheduler()` to cleanly unregister all jobs.
|
|
1414
|
+
- **Auth**: every scheduler tool sits behind the same bearer-token gate
|
|
1415
|
+
as the rest of the MCP HTTP surface (`/mcp` already wraps `requireAuth`).
|
|
1416
|
+
- **Dependency**: adds `croner@^9.1.0` (zero-dep cron parser, ~7 KB).
|
|
1417
|
+
|
|
1418
|
+
### Stats
|
|
1419
|
+
|
|
1420
|
+
- Tool count: **89 → 93** (+4 scheduled_task_* tools)
|
|
1421
|
+
- Tests: **759 → 776** (+5 playbook tests + 14 scheduler tests, all green)
|
|
1422
|
+
- Schema snapshot regenerated.
|
|
1423
|
+
|
|
1424
|
+
### Migration
|
|
1425
|
+
|
|
1426
|
+
None. Drop-in upgrade from v0.9.0.
|
|
1427
|
+
|
|
1428
|
+
---
|
|
1429
|
+
|
|
1430
|
+
## [0.9.0] - 2026-05-14 — Architecture redesign + guides marketplace
|
|
1431
|
+
|
|
1432
|
+
The largest release since v0.7. Net change vs v0.8.17: **−10,200 LOC, +14 new MCP tools, one protocol instead of two, five directories instead of seven**, plus a Reflector feedback channel that closes the loop between verifier signals and planner decisions, plus a public guides marketplace where community-contributed app knowledge ships independently of the binary.
|
|
1433
|
+
|
|
1434
|
+
### Architectural rewrite
|
|
1435
|
+
|
|
1436
|
+
- **One protocol, two transports.** REST surface (`/task`, `/tools`, `/execute/:name`, `/favorites`, `/learn`, `/screenshot`, `/abort`, `/confirm`, `/logs`, `/task-logs`) is gone. Every former REST endpoint is now an MCP tool. The HTTP daemon serves stateless MCP at `POST /mcp` alongside `/health`, `/stop`, and `/` (dashboard).
|
|
1437
|
+
- **Five directories under `src/`.** `core/` (agent loop + pipeline + verifier + safety + skills), `tools/` (one registry, 89 granular + 6 compound), `platform/` (Windows / macOS / Linux X11 / Linux Wayland adapters + Swift host app), `llm/` (providers + credentials + knowledge), `surface/` (CLI + MCP server + dashboard). One concern per directory, no upward dependencies.
|
|
1438
|
+
- **Legacy cascade removed.** The v0.7-era cascade (`computer-use.ts`, `ai-brain.ts`, `action-router.ts`, `generic-computer-use.ts`, 14 more modules — ~12 k LOC) deleted along with the `--legacy` flag and `_executeTaskInternal`. Tag `v0.8.17-legacy` preserves the cascade for emergency cherry-pick.
|
|
1439
|
+
- **CLI verb rename.** `clawdcursor start` → `clawdcursor agent`; `clawdcursor serve` → `clawdcursor agent --no-llm`. Old verbs still work as deprecation aliases through 0.9.x; removed in 0.10.
|
|
1440
|
+
|
|
1441
|
+
### Reflector feedback (CLAWD_REFLECTOR=1)
|
|
1442
|
+
|
|
1443
|
+
The verifier now produces structured `ReflectionFeedback` with typed `Cause[]` and an optional `suggestedStrategy`. Six cause kinds: `no_pixel_change`, `wrong_window_focused`, `modal_intercept`, `a11y_target_missing`, `webview_blind`, `partial_text_match`. The pipeline ladder reroutes based on the dominant cause instead of just rolling down — `webview_blind` jumps straight to vision, `modal_intercept` retries after dismissal. Behind a feature flag for one cycle; default-on in 0.9.1 if telemetry is positive.
|
|
1444
|
+
|
|
1445
|
+
### Safety + correctness
|
|
1446
|
+
|
|
1447
|
+
- **Five tools promoted to Tier 2 (mutation)** after an external audit: `open_file`, `open_url`, `open_uri`, `navigate_browser`, `write_clipboard`. Each can trigger arbitrary OS handlers, network egress, or clipboard hijack — Tier 1 understated the risk.
|
|
1448
|
+
- **Sensitive-app safety gate now actually elevates** instead of just logging. Clicking inside Outlook / 1Password / Mail / banking / private-messaging with no target label → `confirm` (not `allow`).
|
|
1449
|
+
- **App-pattern data consolidated** into `src/core/app-categories.ts`. Single source of truth for the WebView2 settle list + sensitive-app list. The autonomous pipeline never imports it.
|
|
1450
|
+
- **Stateless MCP HTTP transport.** Per-request transport lifecycle, `enableJsonResponse: true` so clients receive plain JSON-RPC instead of SSE event-stream framing they choke on.
|
|
1451
|
+
|
|
1452
|
+
### Agent-loop reliability
|
|
1453
|
+
|
|
1454
|
+
- **Soft-fail subtask policy.** Low-confidence verifier rejection (< 0.5) on a single subtask logs a warning and continues. Idempotent operations like "create new canvas" after `open_app("Paint")` (pixel-change zero because Paint already opened blank) no longer kill the chain at subtask 2.
|
|
1455
|
+
- **Runaway guard on consecutive no-tool-call turns.** Three turns of degenerate model output (e.g. Kimi hitting `max_tokens` with token-loop garbage) trigger a clean rung exit instead of burning the full 5-minute task timeout.
|
|
1456
|
+
- **Kimi `moonshot-v1-*` prose-tool-call parser updated** for the new `functions.NAME:N->{_{...}}` format the model now emits.
|
|
1457
|
+
- **Per-task PIPELINE_DONE footer always fires** with `success/failed (reason) · path · N/M subtasks · $cost · duration`. Was missing on chain-abort + isAborted paths.
|
|
1458
|
+
- **DPI mouse-scale fix.** Both stdio MCP and `clawdcursor agent` now use `physical/image` as the mouseScaleFactor source. Vision-driven clicks land where intended on HiDPI Windows / Retina macOS instead of being 2× too far towards top-left.
|
|
1459
|
+
- **DPI info injected into agent prompt** so models that try to "help" by self-scaling don't pre-multiply.
|
|
1460
|
+
|
|
1461
|
+
### Tools
|
|
1462
|
+
|
|
1463
|
+
- **Tool count 75 → 89.** Fourteen new MCP tools absorbed the former REST endpoints + the marketplace surface: `submit_task`, `abort_task`, `agent_status`, `screenshot_full`, `favorites_list/_add/_remove`, `task_logs_list/_current`, `logs_recent`, `learn_app`, `submit_report`, plus two new guides-management entries.
|
|
1464
|
+
- **Tool registry unified.** Compact (6 compounds) is now a transform over the granular registry, not a parallel catalog. One source of truth, no drift.
|
|
1465
|
+
- **MCP `open_app` uses alias table + PlatformAdapter** instead of raw `Start-Process`. Calculator, Win11 Notepad, and other UWP apps work correctly.
|
|
1466
|
+
- **`focus_window` AND-matches** when given both pid + title — needed for Win11's tabbed Notepad where multiple windows share a pid.
|
|
1467
|
+
- **`type_text` preserves the user's clipboard** around its paste-as-type operation. Was silently clobbering.
|
|
1468
|
+
|
|
1469
|
+
### Guides marketplace (new)
|
|
1470
|
+
|
|
1471
|
+
clawdcursor reasons about every app from screenshots and a11y trees. For popular apps that's slow. v0.9 ships a **marketplace of community-curated app guides** the agent fetches on demand, caches locally based on usage, and uses to operate apps 5–10× faster — without ever blocking the agent loop on the network.
|
|
1472
|
+
|
|
1473
|
+
- **Public registry at <https://clawdcursor.com/app-guides>**, backed by the GitHub repo <https://github.com/AmrDab/clawdcursor-guides>. PR-based submissions, native GitHub identity as anti-spam, vote-issues for ratings (`vote: <app>` issues with 👍/👎 reactions aggregated nightly into `index.json`).
|
|
1474
|
+
- **10 verified seed guides at launch**: gmail, outlook, slack, youtube (the rich-multi-task reference — 19 workflows, 36 shortcuts, 8 layout regions, 13 tips), figma, discord, excel, mspaint, olk (new Outlook), spotify. Maintainer trust labels: `trust:verified` / `trust:community` / `trust:experimental`.
|
|
1475
|
+
- **Three new client-side modules**:
|
|
1476
|
+
- `src/llm/knowledge/remote-loader.ts` — `fetchGuide(app)` with timeout, conditional GET via ETag, stale-while-revalidate.
|
|
1477
|
+
- `src/llm/knowledge/cache.ts` — LRU + TTL (7 days, 50 entries). `touchUsage` reorders LRU on every hit, so popular guides survive eviction even when not most-recently-fetched.
|
|
1478
|
+
- `src/llm/knowledge/guide-linter.ts` — defense-in-depth: schema validation + prompt-injection patterns + dangerous-prose detection runs on every guide before injection, regardless of source (bundled, cached, user-override). Failed guides drop to null — agent falls back to first-principles reasoning, never poisoned-knowledge.
|
|
1479
|
+
- **Bundled core trimmed to 2 guides** (msedge + notepad — Windows defaults that ship with every install). The other 10 curated guides moved to `seed-registry/guides/` and uploaded to the GitHub repo. Lighter binary; guides update independently of releases.
|
|
1480
|
+
- **`clawdcursor guides` CLI rewritten**: `list`, `info <app>`, `available`, `install <app>` / `install --all`, `refresh <app>`, `remove <app>`, `clean`, `lint <file>`, `submit <file>` (lints + prints PR instructions).
|
|
1481
|
+
- **Preprocessor fires `prefetchGuideForApp(app)` async** the moment it detects an active window — by the next task, the cache is warm. First-touch uses whatever's local; subsequent tasks are fast.
|
|
1482
|
+
- **`learn_app` writes rerouted** to the user-override dir at `~/.clawdcursor/ui-knowledge/{app}.json` (was writing into the bundled source tree where the next install would clobber it). Auto-saves successful task patterns under `learnedWorkflows`; FIFO-capped at 20 per app.
|
|
1483
|
+
- **Rich prompt fragment renderer** (`renderAppKnowledge`): the agent now sees SHORTCUTS / WORKFLOWS (★-marked active one first) / LAYOUT / TIPS instead of just 8 comma-joined shortcuts. Cap 6000 chars with graceful degradation; non-active workflows truncated to 180 chars so a 20-workflow guide doesn't crowd out layout.
|
|
1484
|
+
|
|
1485
|
+
### Router
|
|
1486
|
+
|
|
1487
|
+
- **Web-service redirect layer** (`src/core/router/web-services.ts`, 60-entry table). "open youtube" / "open reddit" / "open gmail" now redirects to `handleUrlNav('https://www.youtube.com')` via the OS default browser, instead of fall-through to Start-Menu search → blind-agent escalation. Closes a v0.9 failure mode where the agent typed the literal phrase "default browser" into a search bar. Native-client preference preserved: "open chrome" still launches the desktop client.
|
|
1488
|
+
- **System-context preamble** in the blind/hybrid agent system prompt (`src/core/agent-loop/prompt.ts` section 5c): web services → `open_url(URL)`, never type "browser" into search bars, don't emit "open chrome" before "navigate" unless explicitly named.
|
|
1489
|
+
|
|
1490
|
+
### Verifier
|
|
1491
|
+
|
|
1492
|
+
- **`send_email` no longer falsely passes** when a popup steals foreground. Previous logic checked only `after.activeWindow.title` for compose-window absence — a banner popup focusing the agent's window inverted the check and the verifier reported success while Send was never clicked. Fix iterates the full `after.windows` list (`composeStillOpen = (after.windows ?? []).some(w => !w.isMinimized && composeKeywords.test(w.title))`). Also added: success-keyword detection (`message sent | email sent | sent successfully`), `not_just_saved_as_draft` anti-signal (rejects when "Draft saved" appears without success notice), expanded compose regex to include `reply`.
|
|
1493
|
+
|
|
1494
|
+
### Doctor
|
|
1495
|
+
|
|
1496
|
+
- **Post-doctor "All systems go" panel rewritten** for clarity on the two access paths: MCP server for editor (`clawdcursor mcp`) gets 89 desktop tools (or 6 compound with `--compact`); HTTP daemon (`clawdcursor agent`) for unattended autonomy. Runtime-detects whether an LLM is configured and shows "(you have one)" green or "(none yet)" yellow.
|
|
1497
|
+
|
|
1498
|
+
### Cross-platform integrity
|
|
1499
|
+
|
|
1500
|
+
- **All four OS adapters preserved.** Windows (1,220 LOC) + macOS (903 LOC) + Linux X11 (1,285 LOC) + Linux Wayland (343 LOC) — 3,751 LOC of adapter code, no regression from v0.8.
|
|
1501
|
+
- **macOS host app intact.** `ClawdCursorHost` Swift bundle, `permission-check`, `screenshot-helper`, `clawdcursor grant` flow — all preserved + path-resolution fixed (`getPackageRoot()`) so the host app is found correctly after the directory restructure.
|
|
1502
|
+
|
|
1503
|
+
### Documentation
|
|
1504
|
+
|
|
1505
|
+
- **Professional README rewrite** (340 lines): hero badge row, Mermaid pipeline diagram with Reflector feedback edges, transport / cost-tier / cross-platform / compound-tool tables, 5-directory architecture summary. Modeled on `ollama`, `vercel/ai`, `microsoft/playwright`, `modelcontextprotocol/typescript-sdk`.
|
|
1506
|
+
- **Post-install + post-build banners are state-aware**: skip "Run consent" / "Run doctor" lines when the user already did them on a prior install.
|
|
1507
|
+
- **Two-path next-step routing** at install / consent / doctor: autonomous agent (`doctor` → `agent`) vs MCP-only (register `clawdcursor mcp` with editor host).
|
|
1508
|
+
- **SKILL.md reordered**: fallback discipline first, "no task impossible" confidence second, CAN/MUST/SHOULD third — load-bearing identity preserved verbatim.
|
|
1509
|
+
- **MACOS-SETUP, agent-guide, OPENCLAW-INTEGRATION-RECOMMENDATIONS, dashboard, website** all migrated from REST to MCP HTTP transport language.
|
|
1510
|
+
- **`docs/internal/v0.9-readme-building-blocks.md`** + **`docs/internal/agnostic-audit-report.md`** archived as design records (moved out of the published website root before release).
|
|
1511
|
+
|
|
1512
|
+
### Release hygiene
|
|
1513
|
+
|
|
1514
|
+
- Removed orphan `docs/v0.7.5/` (v0.7-era landing page not linked anywhere).
|
|
1515
|
+
- `package.json` gains `repository`, `homepage`, `bugs`, `author`, `keywords`.
|
|
1516
|
+
- `.nvmrc` added (Node 20).
|
|
1517
|
+
- CI badge URL corrected to the actual workflow filename.
|
|
1518
|
+
|
|
1519
|
+
---
|
|
1520
|
+
|
|
1521
|
+
## [0.8.8] - 2026-05-05 — Reliability + correctness: mod modifier, compact set_value, smart_click foreground OCR, invoke-element timeout
|
|
1522
|
+
|
|
1523
|
+
A focused reliability release closing several real bugs surfaced by a production session (issue #71) and a thorough ultrareview of the v0.8.5 work. Two of the bugs were silent failures — the worst kind for an agent — and one was a hard hang in the standalone PowerShell scripts. Plus a routine round of major-version dependency bumps (express 5, commander 14, dotenv 17, sharp 0.34) and a lint cleanup pass.
|
|
1524
|
+
|
|
1525
|
+
### Fixed
|
|
1526
|
+
|
|
1527
|
+
- **`mod` modifier now resolves correctly on every platform.** The legacy `NativeDesktop` (which `ctx.desktop` binds to in the granular tool registry) had no `mod` translation — only the v2 `PlatformAdapter` did. Calling `computer({"action":"key","combo":"mod+s"})` either threw `Unknown key: "mod"` (Win/Linux) or silently dropped the modifier and typed a literal `s` (macOS). Three coordinated fixes:
|
|
1528
|
+
- `src/keys.ts`: add `mod` to `KEY_ALIASES` resolved at module load to `Super` on darwin and `Control` elsewhere.
|
|
1529
|
+
- `src/native-desktop.ts:707-712`: extend the `macKeyPress` modifier loop to treat `mod` as `command down`. The loop did direct string comparison, so the alias alone wasn't enough.
|
|
1530
|
+
- `src/pipeline/playbooks/keys-blocklist.ts:14-22`: extend `normalizeCombo` so `mod+q` matches `cmd+q` on darwin (otherwise the safety gate would let `mod+q` quit-app through on macOS).
|
|
1531
|
+
- **Compact `accessibility({"action":"set_value", ...})` was broken.** `src/tools/compact.ts:93` delegated to `set_field_value`, but no granular tool by that name was registered (only the agent-internal palettes had it). Calls returned `{isError: true, text: "delegate not registered"}`. Registered the missing tool in `getA11yDepthTools()` mirroring `a11y_expand`/`a11y_toggle`. Tool count: 74 → 75. Schema snapshot regenerated.
|
|
1532
|
+
- **`smart_click` OCR matched text in background windows.** Full-screen OCR scoring iterated all elements and broke on the first exact match, so text in a non-focused window (e.g. Outlook visible behind a "Pick an account" dialog showing the same email) could win and cause a silent wrong-click. Refactored ranking into a `pickBest` helper that runs two passes: foreground-window first (using `activeWin.bounds`), full-screen only if foreground produced no match — with a `[WARNING: matched outside focused window]` annotation in the response so the agent has a signal to verify. From issue #71 review.
|
|
1533
|
+
- **`invoke-element.ps1` hung on React/Electron buttons that advertise InvokePattern but block on Invoke.** The legacy try/catch fallback chain (Invoke → Toggle → bounds) only fired when a pattern *threw*, not when one blocked indefinitely. Wrapped the pattern call in `System.Threading.Tasks.Task::Run` with a 2s `Wait(timeout)`. On timeout the script emits the same `success:false + clickPoint` JSON the existing catch produces. Direct callers of the script benefit; HTTP/MCP callers were already protected by `smart_click`'s 10s outer timeout. From issue #71.
|
|
1534
|
+
- **OpenClaw install metadata used `npm install -g clawdcursor`** but the package isn't published to npm (registry returns 404). OpenClaw following `metadata.openclaw.install` step 1 verbatim would abort before reaching `clawdcursor consent --accept`. Replaced with the documented `curl -fsSL https://clawdcursor.com/install.sh | bash` path that matches every other install surface.
|
|
1535
|
+
|
|
1536
|
+
### Changed
|
|
1537
|
+
|
|
1538
|
+
- **Major dependency bumps**, all CI-green across the cross-platform matrix:
|
|
1539
|
+
- `express` 4.21.2 → 5.2.1 (major) + `@types/express` 4 → 5
|
|
1540
|
+
- `commander` 12.1.0 → 14.0.3 (major)
|
|
1541
|
+
- `dotenv` 16.x → 17.4.2 (major)
|
|
1542
|
+
- `sharp` 0.33.5 → 0.34.5
|
|
1543
|
+
- `eslint` group bumps within v10
|
|
1544
|
+
- **Lint hygiene** — cleared all 10 `@typescript-eslint/no-unused-vars` warnings the CI was surfacing as annotations (74 → 64 warnings). Trivial cleanup, no functional impact: dropped unused test imports (`path`, `afterEach`, `vi`, `beforeEach`, `VerifyResult`, `PipelineConfig`), removed the dead `makePipelineConfig` helper in verifiers.test.ts, renamed `step` to `_step` in `a11y-reasoner.ts:1079` (eslint config already allowed the `^_/u` prefix), and dropped unused error bindings on two `catch (e)` / `catch (err)` blocks.
|
|
1545
|
+
|
|
1546
|
+
### Documentation
|
|
1547
|
+
|
|
1548
|
+
- SKILL.md "What's new" expanded with the 0.8.8 section.
|
|
1549
|
+
- README "Latest Release" updated.
|
|
1550
|
+
- `docs/index.html` (homepage) bumped to v0.8.8 across title, meta tags, hero badge, agent-readable summary, and footer.
|
|
1551
|
+
|
|
1552
|
+
---
|
|
1553
|
+
|
|
1554
|
+
## [0.8.7] - 2026-05-02 — Security hardening: direct-tool safety gate, version-string single-source, tooling bumps
|
|
1555
|
+
|
|
1556
|
+
A security-focused patch release. The headline is a real behaviour change: every direct tool invocation — both the REST `/execute/:name` endpoint and the MCP `callTool` handler — now passes through a shared safety gate, so direct callers can no longer bypass the checks the agent loop already enforced. Plus: the version string is now single-sourced (no more `0.7.2` showing up in MCP metadata three releases late), and the dev tooling is current (TypeScript 6.0, ESLint 10).
|
|
1557
|
+
|
|
1558
|
+
### Fixed
|
|
1559
|
+
|
|
1560
|
+
- **Direct tool execution bypassed safety checks.** REST `/execute/:name` and MCP `callTool` invoked tools without consulting the same gate the agent loop used. A misconfigured client could reach `confirm`-tier or blocked tools without the expected guardrails. New `src/tools/safety-gate.ts` (~40 lines) wraps every direct invocation; both entry points (`src/index.ts`, `src/tool-server.ts`) now route through it. Read-only, blocked, and confirm-tier decisions resolve identically across REST, MCP, and the agent loop. Test coverage in `src/__tests__/tool-safety-gate.test.ts`.
|
|
1561
|
+
- **Accessibility / window / clipboard reads now use `PlatformAdapter` consistently.** `src/tools/a11y.ts` previously called underlying OS APIs directly; aligns with the rest of the codebase by routing through the shared adapter, with a legacy fallback if the adapter is unavailable.
|
|
1562
|
+
|
|
1563
|
+
### Changed
|
|
1564
|
+
|
|
1565
|
+
- **Version string is single-sourced from `package.json`.** `src/index.ts` (the `McpServer` constructor) and `src/onboarding.ts` (the consent file) each kept their own hardcoded copy of the version. Both fell out of sync — `index.ts` shipped `0.7.2` in the MCP handshake for several releases until v0.8.6 caught it manually. Both now import `VERSION` from `src/version.ts`, which already reads `package.json` at runtime. Adds `tests/version-drift.test.ts`: scans `src/**/*.ts` for any literal of the current `package.json` version and fails the build if found anywhere except `src/version.ts`. Future bumps only need to touch `package.json`.
|
|
1566
|
+
- **TypeScript 5.9.3 → 6.0.3** (devDependency). Major compiler bump. `tsconfig.json` adds `"ignoreDeprecations": "6.0"` to silence the new `moduleResolution: "node"` deprecation without changing runtime behaviour — the project remains CommonJS with the same module resolution semantics. A proper migration to `nodenext` can land in a later release.
|
|
1567
|
+
- **ESLint 9 → 10 + typescript-eslint plugins** (devDependency). Major linter bump. ESLint 10 promotes `no-useless-assignment` and `preserve-caught-error` into the recommended ruleset. Resolved all 8 new errors as actual code fixes rather than rule downgrades:
|
|
1568
|
+
- `cdp-driver.ts`: removed useless `let selector = ''` initialiser (all branches assign before use).
|
|
1569
|
+
- `doctor.ts`, `ocr-reasoner.ts`: scoped `smokeOk` and `guidePrompt` as `const` inside their try blocks (they were never read outside).
|
|
1570
|
+
- `compound.ts`: removed useless `= []` initialiser; the catch always returns, so TypeScript still considers `points` definitely assigned.
|
|
1571
|
+
- `smart-interaction.ts`: eliminated the `currentA11yState` tracking variable entirely — it was always equal to the fresh `a11yContext` read at the top of each ReAct loop iteration. Three useless-assignment sites disappear by replacing references with `a11yContext` directly.
|
|
1572
|
+
- `ui-driver.ts`: rethrown `SyntaxError` now includes `{ cause: err }`.
|
|
1573
|
+
- **Routine dependency hygiene.** Playwright `1.58.2 → 1.59.1`, ws `8.19.0 → 8.20.0`, postcss + `@types/*` group bumps, GitHub Actions `setup-node@v4 → v6`, `checkout@v4 → v6`.
|
|
1574
|
+
|
|
1575
|
+
### Documentation
|
|
1576
|
+
|
|
1577
|
+
- SKILL.md "What's new" expanded with the 0.8.7 section. README "Latest Release" updated.
|
|
1578
|
+
- `docs/index.html` (homepage) bumped to v0.8.7 across title, meta tags, hero badge, and footer.
|
|
1579
|
+
|
|
1580
|
+
---
|
|
1581
|
+
|
|
1582
|
+
## [0.8.6] - 2026-05-01 — Polish release: MCP server version, homepage simplification, repo hygiene
|
|
1583
|
+
|
|
1584
|
+
A short follow-up to 0.8.5 that closes one user-visible bug carried over from the v0.7.x line and a handful of professionalism gaps surfaced in a pre-release audit. No schema changes, no behavior changes for agents — purely metadata, docs, and the public landing page.
|
|
1585
|
+
|
|
1586
|
+
### Fixed
|
|
1587
|
+
|
|
1588
|
+
- **`McpServer` advertised the wrong version.** `src/index.ts` constructed the MCP server with `version: '0.7.2'` and `src/onboarding.ts` wrote the same string into the consent file — both untouched since the 0.7.x line. MCP clients (Claude Code, Cursor, Windsurf, Zed) display this string in their server metadata, so users on v0.8.5 saw "clawdcursor v0.7.2" in their host UI. Both sites now read `0.8.6`. `src/index.ts:1054`, `src/onboarding.ts:31`.
|
|
1589
|
+
|
|
1590
|
+
### Added
|
|
1591
|
+
|
|
1592
|
+
- **`SECURITY.md`** — private vulnerability reporting path for a tool that runs with full Accessibility + Screen Recording permissions on the user's desktop. Points reporters at GitHub's private vulnerability reporting flow plus a mailbox fallback. Should have existed since v0.7.0; closing the gap now.
|
|
1593
|
+
|
|
1594
|
+
### Changed
|
|
1595
|
+
|
|
1596
|
+
- **Homepage simplified.** `docs/index.html` lost ~80 lines of decorative weight without losing information:
|
|
1597
|
+
- Removed the page-wide green AI-cursor mouse-follower (CSS + HTML + JS, ~60 lines). Cute, but contradicts the "serious skill, not a demo" framing.
|
|
1598
|
+
- Hero badge collapsed from a 4-fact release-summary string to a one-line `v0.8.6 — latest stable`. Release detail belongs in CHANGELOG, not the hero.
|
|
1599
|
+
- Stats grid pruned from 4 tiles to 3 — the `any AI Model` tile was filler.
|
|
1600
|
+
- "CLI Agent" mode card relabeled `CLI — testing only` to match the README's skill-first reframe (in 0.8.4) where `start` is explicitly the testing/troubleshooting path, not a recommended runtime mode.
|
|
1601
|
+
- The `clawdcursor doctor` post-install comment used to read `# verify install + wire into your agent (MCP)`; `doctor` does not write to host config files. Corrected to `# verify install — then add the MCP block to your agent host config`.
|
|
1602
|
+
- **`LICENSE`** copyright year `2026` → `2025-2026`. The earliest CHANGELOG entry is March 2025.
|
|
1603
|
+
|
|
1604
|
+
### Removed
|
|
1605
|
+
|
|
1606
|
+
- **`V0.7.5-SPEC.md`** at the repo root — describes the v0.7.5 OCR+a11y parallel-merge architecture, which was superseded by the unified blind-first pipeline in v0.8.1/v0.8.2. Five releases of stale content with zero inbound references. Preserved in git history.
|
|
1607
|
+
- **`docs/v0.7.0/`, `docs/v0.7.2/`, `docs/v0.7.12/`, `docs/v0.7.14/`** — pinned-version landing pages for releases that were never published as GitHub Releases. Not linked from the live homepage or README. `docs/v0.7.5/` kept (only pre-0.8 release with a published GitHub Release).
|
|
1608
|
+
|
|
1609
|
+
### Documentation
|
|
1610
|
+
|
|
1611
|
+
- **GitHub Releases backfilled.** Tags v0.8.0, v0.8.2, v0.8.3, v0.8.4, v0.8.5 had existed for weeks without a corresponding Releases entry — only v0.7.5 was published. All five 0.8.x releases now have a Releases entry sourced from this CHANGELOG, with v0.8.5 marked latest until v0.8.6 ships.
|
|
1612
|
+
- SKILL.md "What's new" expanded to cover 0.8.6.
|
|
1613
|
+
|
|
1614
|
+
---
|
|
1615
|
+
|
|
1616
|
+
## [0.8.5] - 2026-04-30 — Review-fix maintenance + compact-tool keyboard fix
|
|
1617
|
+
|
|
1618
|
+
Two remote review passes (six findings + ten findings) on the v0.8.4 docs uncovered one real behavior bug, several factually wrong install instructions, and a long tail of documentation drift that had built up across SKILL.md, README, docs/index.html, and source comments. This release closes all of it. 429/430 tests still pass; granular schema snapshot unchanged.
|
|
1619
|
+
|
|
1620
|
+
### Fixed
|
|
1621
|
+
|
|
1622
|
+
- **`computer({"action":"key","combo":"..."})` now works.** The compound `key` / `key_press` / `key_down` / `key_up` actions had no `argRemap`, so the schema exposed `key` (not `combo`). REST rejected `combo` as an unknown parameter; MCP silently dropped it and the granular handler crashed with `(undefined).toLowerCase()`. Implemented the remap that `compact.ts:46-47` had documented as the canonical example since v0.8.1 — `argRemap: { combo: 'key' }` on all four keyboard actions. Granular schema is unaffected; the `key` granular tool still takes `key`. `src/tools/compact.ts`.
|
|
1623
|
+
- **Stale "72 granular tools" count** in user-visible places — `clawdcursor mcp --help`, the markdown returned by `GET /docs`, plus four internal source comments. CHANGELOG v0.8.2 established 74 (72 + 2 Electron-bridge tools) as canonical; the agent-facing surfaces are now consistent. `src/index.ts`, `src/tool-server.ts`, `src/tools/compact.ts`, `src/tools/index.ts`.
|
|
1624
|
+
|
|
1625
|
+
### Documentation
|
|
1626
|
+
|
|
1627
|
+
- **README installer claims rewritten.** The previous wording falsely claimed the installer (1) drops files into `~/.clawdcursor`, (2) registers an MCP server in `~/.claude/settings.json`, and (3) copies SKILL.md into every detected agent's skill directory. Verified against `docs/install.sh` and `docs/install.ps1`: the installer only clones to `~/clawdcursor` (no dot), runs `npm install + build`, and `npm link`s the global shim. The dotted `~/.clawdcursor/` directory holds runtime state only. Wiring the skill into Claude Code now correctly says the JSON block is required, not optional.
|
|
1628
|
+
- **Compact-action surface corrections.** The README's compact-tool table used invented action names — `accessibility.read_screen` (actual: `read_tree`), `accessibility.get_focused` (`focused`), `window.set_state`/`set_bounds`/`get_active` (none exist), `system.open_app` (lives on `window`), `system.read_clipboard` (`clipboard_read`), `browser.navigate` (lives on `window`), and the entire `task` action enum (`task` has no enum — just `{instruction}`). All rewritten against `src/tools/compact.ts`. Marquee example also fixed to use real calls.
|
|
1629
|
+
- **Linux accessibility package.** Was `at-spi2-core` + `python3-gi`; the actual missing package on a fresh Ubuntu install is `gir1.2-atspi-2.0` (the AT-SPI typelib that `python3-gi` consumes). Brought into line with SKILL.md, the probe script's hint, and the platform adapter docstring.
|
|
1630
|
+
- **Compact-action tables now non-exhaustive by default.** Added a "Most-used actions" header + caveat pointing to `GET /tools?mode=compact`, and filled in the high-value entries that had been silently dropped (`accessibility.list_children`, `browser.page_context`, `window.list_displays` / `screen_size` / `switch_tab`, `computer.scroll_horizontal` / `triple_click`).
|
|
1631
|
+
- **`clawdcursor dashboard` removed** from the README CLI block — that command never existed; the dashboard is reachable at `http://127.0.0.1:3847` while `serve` or `start` is running. `status` and `consent` subcommands added to the CLI block since they were referenced in the Options block but never introduced.
|
|
1632
|
+
- **`--compact` / `--accept` flag scopes corrected.** README claimed `--compact` works on `serve`; it's mcp-only (`serve` uses `?mode=compact` on `GET /tools`). README claimed `--accept` is universal; it lives on `start` and `consent` (`serve` uses `--skip-consent`).
|
|
1633
|
+
- **"Anthropic Agent SDK" → "Claude Agent SDK"** (the official product name) across README.
|
|
1634
|
+
- **`invoke_element` recategorized** from "Window / App" to "Accessibility" in the README — matches its registration in `src/tools/a11y_depth.ts` and the SKILL.md taxonomy.
|
|
1635
|
+
- **`docs/index.html` install snippets** no longer push `clawdcursor start` as the canonical post-install step (contradicts the new "skill, not application" framing). Replaced with `clawdcursor doctor` (verify-the-install) and a footer note that `start` is testing-only. Hero badge CVE list now includes `follow-redirects`.
|
|
1636
|
+
- **SKILL.md `/health` example** now uses `<x.y.z>` placeholder instead of a hard-coded version that drifts every release. "What's new" section expanded to cover 0.8.4 + 0.8.3 + 0.8.2.
|
|
1637
|
+
- **Cost-tier ladder + "no task is impossible" callout** added to SKILL.md (lines 38, 108-118). Sets the default agent disposition: GUI + mouse + keyboard = everything you need; start at T1 (structured a11y), escalate only when the current tier fails.
|
|
1638
|
+
- **Skill-first README rewrite.** The headline now reads "The skill that gives any AI agent eyes, hands, and a keyboard on a real desktop." `start` / `task` are demoted to a "Testing and Troubleshooting" appendix with explicit guidance that agents should not invoke them — they go through MCP or the REST surface. Replaces the earlier "OS-level desktop automation server" framing.
|
|
1639
|
+
- **Stale tagline cleanup.** Removed "ears" (no audio capture exists in `src/`) from `package.json` description, SKILL.md frontmatter, and `docs/index.html` meta tags + agent-readable summary. Aligned with the README's existing "eyes, hands, and a keyboard" wording.
|
|
1640
|
+
- **Pre-existing fix while in the area:** dropped the blocking `clawdcursor serve` step from `metadata.openclaw.install` in SKILL.md. `serve` is a foreground HTTP server with no auto-exit; using it as a sequential install step would either hang the installer or leave a zombie daemon — directly contradicts the "nothing runs in the foreground" framing.
|
|
1641
|
+
|
|
1642
|
+
### Verified, not changed
|
|
1643
|
+
|
|
1644
|
+
- **Cmd+Q is blocked.** Review claimed Cmd+Q is not actually blocked by the safety layer. Verified against `src/pipeline/playbooks/keys-blocklist.ts:24` + `src/pipeline/safety/layer.ts:325-328`: it IS blocked through the SafetyLayer chokepoint via both `combo` and `key` arg paths. README is correct; no change needed.
|
|
1645
|
+
|
|
1646
|
+
---
|
|
1647
|
+
|
|
1648
|
+
## [0.8.4] - 2026-04-21 — Security maintenance + README rewrite
|
|
1649
|
+
|
|
1650
|
+
Dependency audit release. No functional changes, no schema changes, 429/430 tests still pass.
|
|
1651
|
+
|
|
1652
|
+
### Security
|
|
1653
|
+
|
|
1654
|
+
Patched every fixable advisory in the dependency tree (5 of 12 surfaced by `npm audit`). The remaining 7 moderate alerts all chain through `jimp → @nut-tree-fork/nut-js` and have no upstream fix yet; tracked for a follow-up once nut-js releases a jimp upgrade.
|
|
1655
|
+
|
|
1656
|
+
- **`vite`** → 7.3.2+ · **High** · path traversal in optimized-deps `.map` handling ([GHSA-4w7w-66w2-5vf9](https://github.com/advisories/GHSA-4w7w-66w2-5vf9)), `server.fs.deny` bypass via query strings ([GHSA-v2wj-q39q-566r](https://github.com/advisories/GHSA-v2wj-q39q-566r)), arbitrary file read via dev-server WebSocket ([GHSA-p9ff-h696-f583](https://github.com/advisories/GHSA-p9ff-h696-f583)).
|
|
1657
|
+
- **`path-to-regexp`** → 0.1.13+ · **High** · ReDoS via multiple route parameters ([GHSA-37ch-88jc-xwx2](https://github.com/advisories/GHSA-37ch-88jc-xwx2)).
|
|
1658
|
+
- **`picomatch`** → 4.0.4+ · **High** · method injection in POSIX character classes + ReDoS via extglob quantifiers ([GHSA-3v7f-55p6-f55p](https://github.com/advisories/GHSA-3v7f-55p6-f55p), [GHSA-c2c7-rcm5-vvqj](https://github.com/advisories/GHSA-c2c7-rcm5-vvqj)).
|
|
1659
|
+
- **`hono`** → 4.12.14+ · Moderate · HTML injection in `hono/jsx` SSR via unsafe attribute names ([GHSA-458j-xx4x-4375](https://github.com/advisories/GHSA-458j-xx4x-4375)).
|
|
1660
|
+
- **`follow-redirects`** → 1.15.12+ · Moderate · custom auth headers leaked across cross-domain redirects ([GHSA-r4q5-vmmm-2653](https://github.com/advisories/GHSA-r4q5-vmmm-2653)).
|
|
1661
|
+
|
|
1662
|
+
### Changed
|
|
1663
|
+
|
|
1664
|
+
- **README rewrite.** Removed stale "What's New in v0.8.0 — V2 Architecture" headliner (v0.8.0's V2-vs-legacy split was unified in v0.8.2 — no opt-in flag, no two pipelines). Pipeline section now reflects the unified blind → hybrid → vision router, the `safety.evaluate()` chokepoint, ground-truth verification, and the v0.8.3 runaway guard. Tool surface reorganized around the 6-tool compact catalog and the 74-tool granular catalog. Tone tightened; marketing phrasing trimmed.
|
|
1665
|
+
|
|
1666
|
+
---
|
|
1667
|
+
|
|
1668
|
+
## [0.8.3] - 2026-04-19 — Hotfix: "Outlook keeps opening" + runaway guard
|
|
1669
|
+
|
|
1670
|
+
User reported Outlook launching repeatedly during a test. Root-cause diagnosis traced to three compounding failures: (1) `PlatformAdapter.openApp` spawned a new instance even when the app was already running, (2) the escalation ladder (router → blind → hybrid → vision) re-ran `open_app` at each rung because earlier rungs couldn't verify success through New Outlook's sparse WebView2 accessibility tree, (3) `clawdcursor stop` only killed the `start` process on port 3847, missing `serve` (different port / same port different process) and `mcp` (stdio, no port) entirely. A stale `serve` kept receiving MCP traffic after the user thought they'd stopped everything.
|
|
1671
|
+
|
|
1672
|
+
### Fixed
|
|
1673
|
+
|
|
1674
|
+
- **`openApp` / `launchApp` idempotency** (Windows + macOS + Linux). When the target app already has a visible window AND the caller didn't set `alwaysNewInstance: true` AND no `url` is passed, the adapter now focuses the existing window and returns its pid instead of spawning another instance. Match policy: case-insensitive exact processName → processName substring → title substring → UWP AppId tail. Closes the "N windows of Outlook stacking up" class of bug under any retry loop. `src/v2/platform/{windows,macos,linux}.ts`.
|
|
1675
|
+
- **Agent runaway guard** — if the agent calls the same tool + identical args ≥ 3 times within the last 6 turns, the loop exits with `give_up` and a targeted message suggesting `detect_webview_apps` when the target is likely Electron/WebView2. Prevents the generalized "retry-loop-because-a11y-is-opaque" anti-pattern. `src/pipeline/agent/agent.ts`.
|
|
1676
|
+
- **`clawdcursor stop` now sweeps all modes.** After the graceful `/stop` on port 3847, iterates every pidfile in `~/.clawdcursor/*.pid`, SIGTERMs any live pid, SIGKILLs after 500ms if still running, and unlinks the pidfile. Catches `mcp` (stdio-only), zombie `serve`, and any start/serve on a non-default port. `src/index.ts`.
|
|
1677
|
+
|
|
1678
|
+
### Notes
|
|
1679
|
+
|
|
1680
|
+
- Stale-pidfile cleanup at startup was already correct via `claimPidFile` (checks `isProcessAlive(existingPid)` and overwrites when dead) — no code change needed there; the issue was exclusively `stop`.
|
|
1681
|
+
- Tests: 429 / 430 pass (1 skipped, same as 0.8.2). No schema snapshot change — these are behavioral fixes, not catalog changes.
|
|
1682
|
+
|
|
1683
|
+
---
|
|
1684
|
+
|
|
1685
|
+
## [0.8.2] - 2026-04-19 — Session reliability, force-focus, Electron bridge
|
|
1686
|
+
|
|
1687
|
+
First-time-user review surfaced six concrete pain points. This release fixes every one.
|
|
1688
|
+
|
|
1689
|
+
### Fixed
|
|
1690
|
+
|
|
1691
|
+
- **Silent 401 mid-session** (the session-killer). Previous versions compared the incoming Bearer token against an in-memory `SERVER_TOKEN` only. A second clawdcursor process (stale pidfile takeover, or a concurrent mode) rewrote the token FILE without updating the first server's in-memory copy — clients reading the file silently lost auth. `/health` kept returning 200 so the failure was invisible. Fix: `requireAuth` now accepts EITHER the in-memory token OR the current on-disk token (mtime-cached, ~free). Drift is logged once with a recovery hint. `src/server.ts`.
|
|
1692
|
+
- **`focus_window` force-to-front on Windows.** Previous implementation called `SetForegroundWindow` which the OS blocks when the caller isn't the current foreground process. New implementation uses the full sequence: `ShowWindow(SW_RESTORE)` → topmost-toggle → `AttachThreadInput` with the current foreground thread → `AllowSetForegroundWindow(ASFW_ANY)` → `BringWindowToTop` → `SetForegroundWindow`, with an Alt-key synthetic fallback. Raises any window through Windows' foreground lock. `scripts/ps-bridge.ps1`.
|
|
1693
|
+
- **Richer validation errors.** REST `/execute` rejections now carry the full expected tool signature. A missing param returns `Missing required parameter "target". Expected smart_click(target: string, processId?: number).` — agents no longer have to roundtrip to `/docs`. `src/tool-server.ts`.
|
|
1694
|
+
|
|
1695
|
+
### Added
|
|
1696
|
+
|
|
1697
|
+
- **Electron / WebView2 detection.** New MCP tools `detect_webview_apps` and `relaunch_with_cdp` (also exposed via compact `system({"action":"detect_webview"})` / `system({"action":"relaunch_with_cdp"})`). Recognises olk (New Outlook), Teams, Discord, Slack, VS Code, GitHub Desktop, Notion, Obsidian, Spotify. When detected, probes ports 9222/9223/9229/8315 for a live CDP endpoint; if found, tells the agent to attach via `browser({"action":"connect"})`. If not, shows the exact relaunch command (e.g. `discord --remote-debugging-port=9222`) so CDP can be enabled and the sparse UIA tree bypassed entirely. `src/tools/electron_bridge.ts`.
|
|
1698
|
+
- **`drag_path` documentation clarity.** Existing `mouse_drag_stepped` / compact `computer({"action":"drag_path","path":"[...]"})` now explicitly documented for freehand curve drawing (Paint, Figma, canvas apps). SKILL.md "Quick reference" covers when to use `drag_path` vs `drag`.
|
|
1699
|
+
|
|
1700
|
+
### Changed
|
|
1701
|
+
|
|
1702
|
+
- **SKILL.md pushes compact mode harder.** Top of doc now carries a directive callout: *"If you are an LLM reading this: YOU SHOULD BE USING COMPACT MODE."* with MCP config + REST URL. Granular stays available but is explicitly labeled the power-user / larger-prompt option.
|
|
1703
|
+
- **SKILL.md web-app keyboard warning.** Web-wrapped apps (Outlook, Teams, Gmail) treat `Escape` as "close dialog/modal" — sometimes closing the compose window. Documented: do not use Escape to dismiss autocompletes in web apps; use arrow keys + Enter or click-away.
|
|
1704
|
+
- **Error-recovery table** expanded with Electron-vs-true-canvas split, v0.8.2 auth recovery, v0.8.2 force-focus note, and the `drag_path` vs `drag` distinction.
|
|
1705
|
+
|
|
1706
|
+
### Tests
|
|
1707
|
+
|
|
1708
|
+
- 429 / 430 passing (one skipped, same as 0.8.0).
|
|
1709
|
+
- Schema snapshot regenerated → 74 granular tools (72 + 2 Electron bridge).
|
|
1710
|
+
- Live smoke: token auth survives a second `clawdcursor serve`; `focus_window` raises Paint through a full-screen window; `detect_webview_apps` correctly flags Outlook / Teams / VS Code when any are open.
|
|
1711
|
+
|
|
1712
|
+
### Consolidates v0.8.1 (never tagged)
|
|
1713
|
+
|
|
1714
|
+
0.8.1-alpha.0 through -alpha.N shipped unified-pipeline + compact-MCP + Linux AT-SPI + Wayland routing on the feature branch. They roll into 0.8.2 as a single stable release. See the v0.8.1-alpha tag range in the git history for per-tranche detail; headline features:
|
|
1715
|
+
|
|
1716
|
+
- **Unified blind/hybrid/vision agent** — one loop, three modes. Replaces the v0.8.0 split `text-agent` + `vision-agent` with a single harness using native `tool_use` (Anthropic) / `tool_calls` (OpenAI) / prose-JSON fallback.
|
|
1717
|
+
- **Compact MCP surface** — 6 compound tools (`computer`, `accessibility`, `window`, `system`, `browser`, `task`) that collapse the full capability into ~1,500 tokens of catalog. Anthropic-Computer-Use shape extended across the whole product. `clawdcursor mcp --compact` or `GET /tools?mode=compact`.
|
|
1718
|
+
- **PlatformAdapter widened** — `mouseDown/Up`, `keyDown/Up`, `setWindowState`, `setWindowBounds`, `listDisplays`, `waitForElement`, widened `InvokeAction` (`expand`/`collapse`/`toggle`/`select`/`get-value`), richer `UiElement` state flags.
|
|
1719
|
+
- **Linux AT-SPI bridge** — read-only first pass via `python3-gi` + `gir1.2-atspi-2.0`. Linux a11y methods (`getUiTree`, `findElements`, `getFocusedElement`, `waitForElement`) now return real data on boxes where the bridge dependencies are present. `invokeElement` still stubbed — tracked for a follow-up pass.
|
|
1720
|
+
- **Linux Wayland input routing** — `ydotool` (mouse + keyboard) or `wtype` (keyboard fallback) detected at init. X11 path unchanged; Wayland no longer silently mis-fires through nut-js.
|
|
1721
|
+
- **Per-capability palettes + compound vision tools** — text-agent turns now see a 6-10 tool scoped palette based on the subtask's capability (`app_launch` / `text_input` / `navigation` / `form_fill` / `spatial` / `file_ops` / `window_mgmt` / `general`). Vision-agent turns see 3 compound `mouse` / `keyboard` / `window` tools with action enums. ~12× fewer catalog tokens per turn.
|
|
1722
|
+
- **Pretty TTY logs with HH:MM:SS timestamps** — layer-tagged (`[router]`, `[blind]`, `[vision]`, `[safety]`, etc.), no per-line repetition, `CLAWD_LOG=pretty` default on TTY.
|
|
1723
|
+
- **SKILL.md rewrite** — reviewed by a Sonnet subagent against legacy v0.6.3/v0.7.14 tone, verified model-agnostic + OS-agnostic, restored "USE AS A FALLBACK" + "IMPORTANT — READ THIS BEFORE ANYTHING ELSE" directive callouts and Sensitive App Policy.
|
|
1724
|
+
|
|
1725
|
+
---
|
|
1726
|
+
|
|
1727
|
+
## [0.8.0] - 2026-04-16 — V2 Architecture (opt-in)
|
|
1728
|
+
|
|
1729
|
+
A ground-up reimagining of the internal pipeline. Opt in with `clawdcursor start --v2`. The legacy pipeline is unchanged and remains the default.
|
|
1730
|
+
|
|
1731
|
+
### Added
|
|
1732
|
+
|
|
1733
|
+
- **`--v2` flag on `clawdcursor start`** — activates the new 3-layer architecture: Router → VisionAgent → Verifier. No effect on MCP, `serve`, or legacy `start`.
|
|
1734
|
+
- **`src/v2/platform/`** — platform abstraction. Single `PlatformAdapter` interface with `macos.ts`, `windows.ts`, `linux.ts` implementations. Replaces 142+ scattered `if (process.platform === 'darwin')` branches across 34 files. Business logic no longer sees `process.platform`. Adding a new OS = one file.
|
|
1735
|
+
- **`src/v2/verifier/`** — `GroundTruthVerifier`. Six independent signals decide whether a task actually completed: pixel diff, window change, focus change, OCR delta, task-specific assertions (`send_email`, `navigate_url`, `open_app`, `type_text`, `search`, `compose_message`, `create_file`), and anti-patterns (error dialogs, "cannot send", "draft saved", invalid recipient, auth failed). Weighted voting with hard-fail rules on anti-patterns. Cannot be fooled by LLM self-reported "done".
|
|
1736
|
+
- **`src/v2/agent/`** — `VisionAgent`: a single vision-first tool-use loop. 16 tools (`screenshot`, `read_screen`, `list_windows`, `click`, `drag`, `scroll`, `type`, `key`, `invoke_element`, `set_field_value`, `open_app`, `focus_window`, `read_clipboard`, `write_clipboard`, `wait`, `done`). 6-rule system prompt (down from 36). Model-agnostic via existing `callVisionLLM`.
|
|
1737
|
+
- **`src/v2/orchestrator.ts`** — `PipelineV2` wires Router → VisionAgent → Verifier with before/after state capture.
|
|
1738
|
+
- **Hardened JSON parser** — tolerates trailing braces, markdown code fences, and other common LLM malformations. Balanced-brace extraction as fallback.
|
|
1739
|
+
|
|
1740
|
+
### Fixed
|
|
1741
|
+
|
|
1742
|
+
- **False positives** — legacy pipeline reports `UNVERIFIED_SUCCESS` when the agent claims "done" but the screen didn't change. V2 verifier catches this class: in a live email-send test the agent said "Email sent" but a "Cannot send" dialog was on screen. V2 correctly rejected the claim. (Legacy still does what it does; this fix only applies when `--v2` is set.)
|
|
1743
|
+
|
|
1744
|
+
### Testing
|
|
1745
|
+
|
|
1746
|
+
Smoke-tested on macOS with Anthropic Claude Haiku (text) + Sonnet (vision):
|
|
1747
|
+
|
|
1748
|
+
| Task | Time | Verdict |
|
|
1749
|
+
|------|------|---------|
|
|
1750
|
+
| Open TextEdit and type | 30s | ✅ (4/6 signals) |
|
|
1751
|
+
| Calculator: 47+53=100 | 65s | ✅ (5/6 signals, zero parse errors) |
|
|
1752
|
+
| Safari → github.com | 45s | ✅ (6/6 signals) |
|
|
1753
|
+
| Notes: create note | 182s | ✅ (6/6 signals) |
|
|
1754
|
+
| Email send (failing server) | 86s | ❌ **Correctly rejected** — legacy would have reported success |
|
|
1755
|
+
|
|
1756
|
+
### Platform Safety
|
|
1757
|
+
|
|
1758
|
+
No legacy code modified. Windows, Linux, and MCP paths untouched. v2 code is entirely under `src/v2/`.
|
|
1759
|
+
|
|
1760
|
+
## [0.7.14] - 2026-04-13 — Full macOS Keyboard Automation + Platform-Aware Pipeline
|
|
1761
|
+
|
|
1762
|
+
### Fixed
|
|
1763
|
+
- **macOS keystrokes silently dropped** — root cause: `CGEvent.post()` from the Swift helper is blocked by macOS TCC when the helper is spawned as a child of Node.js. `keyPress()` and `typeText()` on macOS now route through `osascript` + System Events (the Apple-sanctioned method). All keyboard shortcuts (Cmd+V, Cmd+N, Shift+Cmd+D, etc.) now work correctly.
|
|
1764
|
+
- **Single-char keys losing modifiers** — `keycodeForCharacter()` lookup added to `ClawdCursorHelper`; modifiers are no longer discarded for Cmd+letter combos.
|
|
1765
|
+
- **`asDouble()` coercion** — click/drag coordinates sent as integers (common from some LLMs) no longer fail with a type mismatch in the Swift helper.
|
|
1766
|
+
- **`keycodeForCharacter` fallback** — now returns an error for unmapped characters instead of silently falling back to the 'v' keycode.
|
|
1767
|
+
- **Permission check inconsistency** — `doctor`, `status`, and `readiness.ts` all now query the same canonical path: Host `/status` → `permission-check` binary → direct fallback. No more false "granted" reports.
|
|
1768
|
+
- **Screenshot capture CPU spin** — replaced `CGWindowListCreateImage` (triggers ReplayKit CPU spin bug on macOS 14+) with a delegated `screenshot-helper` subprocess.
|
|
1769
|
+
- **A11y false positive** — `isShellAvailable()` now tests actual window access (`p.windows.length`) instead of `processes.length`, which worked without Accessibility permission.
|
|
1770
|
+
- **Node.js v25 crash** — `EINVAL`/`setTypeOfService` socket error from undici's internal QoS call is now caught and suppressed (non-fatal).
|
|
1771
|
+
- **Dock click zone** — reduced from 60px to 30px on macOS (Dock is thinner than the Windows taskbar).
|
|
1772
|
+
- **Browser URL bar shortcut** — `Cmd+L` used on macOS (was `Ctrl+L`, which does nothing in macOS browsers).
|
|
1773
|
+
|
|
1774
|
+
### Added
|
|
1775
|
+
- **`macMailEmailFlow`** — deterministic email flow for macOS Mail.app (Cmd+N, Tab to subject/body, Cmd+Shift+D to send).
|
|
1776
|
+
- **`clawdcursor grant` command** — triggers macOS system permission dialogs directly from the CLI.
|
|
1777
|
+
- **115 Apple shortcuts** — Mail, Safari, Notes, Messages, Terminal added to the shortcut database.
|
|
1778
|
+
- **`scripts/test-macos-fixes.sh`** — one-shot E2E verification script: rebuild, binary check, permission consistency, screenshot capture, doctor cross-check.
|
|
1779
|
+
- **`--request-screen-recording` flag** on `permission-check` binary — optional TCC dialog trigger for Screen Recording.
|
|
1780
|
+
- **`processPath` + `bundleId`** in all permission check responses — aids TCC debugging.
|
|
1781
|
+
- **30s TTL cache** on A11y shell availability — permission grants mid-session are now detected without restart.
|
|
1782
|
+
- **macOS native binary verification** in `scripts/verify-install.js` — warns on missing binaries at `npm install` time.
|
|
1783
|
+
- **`setup` script auto-builds** native binaries on macOS (inside `npm run setup`).
|
|
1784
|
+
|
|
1785
|
+
### Changed
|
|
1786
|
+
- **`build.sh`** — marked executable in git, fails fast on missing binaries (was silently warning), better error guidance.
|
|
1787
|
+
- **Installer** — verifies all 4 required binaries (not just `ClawdCursorHost`), uses `bash ./build.sh` for portability.
|
|
1788
|
+
- **`doctor.ts`** — permission check unified via `native-helper` module; triggers system permission dialogs if denied.
|
|
1789
|
+
- **Email flow keyboard shortcuts** — platform-aware: `Ctrl+Enter` → `Shift+Cmd+D` on macOS, `Ctrl+H` → `Cmd+Option+F` for Find & Replace.
|
|
1790
|
+
- **`sharp`** bumped `^0.33.0` → `^0.33.5`.
|
|
1791
|
+
|
|
1792
|
+
### Platform Safety
|
|
1793
|
+
No Windows or Linux code paths affected. All macOS changes are gated behind `IS_MAC` / `process.platform === 'darwin'` / `isMacOS()`.
|
|
1794
|
+
|
|
1795
|
+
## [0.7.13] - 2026-04-10 — Unified Permission Checks + Screenshot Helper
|
|
1796
|
+
|
|
1797
|
+
### Fixed
|
|
1798
|
+
- **Permission check fragmentation** — doctor, status, and readiness each used different permission APIs, producing contradictory results. All now route through `ClawdCursorHost /status` → `permission-check` binary → direct `AXIsProcessTrusted` fallback.
|
|
1799
|
+
- **Screenshot CPU spin** — delegated `takeScreenshot()` to `screenshot-helper` subprocess, eliminating the ReplayKit CPU spike on macOS 14+.
|
|
1800
|
+
- **Installer binary verification** — now checks all 4 required binaries (`ClawdCursorHost`, `clawdcursor-helper`, `screenshot-helper`, `permission-check`) instead of just `ClawdCursorHost`.
|
|
1801
|
+
- **`build.sh` silent failures** — `swift build` errors now fail the build immediately with actionable guidance.
|
|
1802
|
+
|
|
1803
|
+
### Added
|
|
1804
|
+
- **`clawdcursor grant` command** — triggers macOS system permission dialogs for Accessibility and Screen Recording.
|
|
1805
|
+
- **`processPath` + `bundleId`** in permission check responses for TCC debugging.
|
|
1806
|
+
- **`--request-screen-recording` flag** on `permission-check` binary.
|
|
1807
|
+
|
|
1808
|
+
## [0.7.12] - 2026-04-09 — Comprehensive macOS TCC Fix
|
|
1809
|
+
|
|
1810
|
+
### Fixed
|
|
1811
|
+
- **Bash pipeline bug** — `set -o pipefail` added; build failures now properly detected (was silently passing due to pipeline exit status bug)
|
|
1812
|
+
- **Ad-hoc signing by default** — build.sh now always signs the app (required for TCC on macOS 26+ Tahoe where unsigned binaries don't appear in privacy settings)
|
|
1813
|
+
- **Build error capture** — uses temp file instead of pipe to properly capture exit status
|
|
1814
|
+
- **TCC permission check** — runs permission-check after build to show current accessibility/screen recording status
|
|
1815
|
+
|
|
1816
|
+
### Changed
|
|
1817
|
+
- **build.sh rewritten** — cleaner structure, ad-hoc signing is default (not optional), signature verification added
|
|
1818
|
+
- **Codesign uses --deep** — ensures all nested binaries are signed
|
|
1819
|
+
- **Installer shows TCC status** — tells user exactly which permissions need to be granted and where
|
|
1820
|
+
|
|
1821
|
+
### Technical Details
|
|
1822
|
+
The core issue was TCC (Transparency, Consent, and Control) on macOS binds permissions to the code signing identity. Without signing:
|
|
1823
|
+
- On macOS 26+ (Tahoe), unsigned binaries don't appear in System Settings privacy panels at all
|
|
1824
|
+
- Users saw "ClawdCursorHost binary not found" errors even though install appeared to succeed
|
|
1825
|
+
|
|
1826
|
+
Reference: mediar-ai/mcp-server-macos-use for TCC permission handling patterns.
|
|
1827
|
+
|
|
1828
|
+
## [0.7.11] - 2026-04-09 — macOS Installer Fix
|
|
1829
|
+
|
|
1830
|
+
### Fixed
|
|
1831
|
+
- **macOS installer now fails loudly if native host build fails** — was silently swallowing build errors and claiming "optional fallback" that doesn't exist
|
|
1832
|
+
- **Added verification step** — installer explicitly checks ClawdCursorHost binary exists before declaring success
|
|
1833
|
+
- **Show build output** — Swift build errors are now visible instead of redirected to /dev/null
|
|
1834
|
+
- **Clear error messages** — tells users exactly what went wrong and how to fix it (xcode-select --install, manual rebuild, etc.)
|
|
1835
|
+
|
|
1836
|
+
### Changed
|
|
1837
|
+
- macOS native host is now correctly marked as REQUIRED, not optional
|
|
1838
|
+
- Installer exits with error code 1 if native build fails on macOS
|
|
1839
|
+
|
|
1840
|
+
## [0.7.10] - 2026-04-08 — Guided Setup Flow
|
|
1841
|
+
|
|
1842
|
+
### Changed
|
|
1843
|
+
- **Installer shows next steps** — after install, displays clear guidance: `clawdcursor doctor` → `clawdcursor start`
|
|
1844
|
+
- **Doctor shows run options** — after passing all checks, shows both `start` (full agent) and `serve` (tools-only) modes
|
|
1845
|
+
- **Consent shows next step** — after granting consent, directs users to `clawdcursor doctor`
|
|
1846
|
+
|
|
1847
|
+
## [0.7.9] - 2026-04-08 — UX Improvements
|
|
1848
|
+
|
|
1849
|
+
### Changed
|
|
1850
|
+
- **macOS permission messages** — now direct users to enable "ClawdCursor" instead of "Terminal/Node"
|
|
1851
|
+
- **Screen Recording path** — updated to "Screen & System Audio Recording" (macOS Sequoia naming)
|
|
1852
|
+
|
|
1853
|
+
## [0.7.8] - 2026-04-08 — Documentation Fix
|
|
1854
|
+
|
|
1855
|
+
### Fixed
|
|
1856
|
+
- **Installer comments updated** — example version references now point to v0.7.8
|
|
1857
|
+
|
|
1858
|
+
## [0.7.7] - 2026-04-08 — Installer Fixes
|
|
1859
|
+
|
|
1860
|
+
### Fixed
|
|
1861
|
+
- **Installers default to main branch** — install.sh and install.ps1 now use `main` instead of hardcoded non-existent tag
|
|
1862
|
+
- **macOS installer builds native helper** — install.sh now runs `./native/build.sh` on Darwin if Swift is available
|
|
1863
|
+
- **Version override support** — `VERSION=v0.7.7 curl ... | bash` or `$env:VERSION='v0.7.7'` to install specific release
|
|
1864
|
+
- **Auto-pull on update** — installers now run `git pull` after checkout to get latest changes
|
|
1865
|
+
|
|
1866
|
+
## [0.7.6] - 2026-04-08 — macOS Native Host App
|
|
1867
|
+
|
|
1868
|
+
### Added
|
|
1869
|
+
- **macOS Host App (ClawdCursorHost)** — new native Swift executable that runs as the app bundle's main process, owning all TCC permissions (Accessibility, Screen Recording) under a single app identity
|
|
1870
|
+
- **Localhost IPC server** — host app exposes `GET /health`, `GET /status`, `POST /rpc` on `127.0.0.1:3848` for CLI→host communication
|
|
1871
|
+
- **Token-based authentication** — `~/.clawdcursor/host-token` (mode 0600) secures the IPC channel
|
|
1872
|
+
- **Auto-launch/stop** — `clawdcursor start` ensures host is running; `clawdcursor stop` gracefully quits it
|
|
1873
|
+
- **New Swift helper methods** — `moveMouse`, `dragMouse`, `captureScreen` for smoother native macOS automation
|
|
1874
|
+
- **Menu bar presence** — host app shows 🐾 icon in menu bar for visibility
|
|
1875
|
+
|
|
1876
|
+
### Security
|
|
1877
|
+
- **Localhost-only binding** — IPC server uses `NWParameters.requiredLocalEndpoint` to bind to `127.0.0.1` only, rejecting connections from other machines
|
|
1878
|
+
- **Token file permissions** — host-token created with mode 0600 (owner read/write only)
|
|
1879
|
+
|
|
1880
|
+
### Changed
|
|
1881
|
+
- `src/native-helper.ts` — routes all macOS desktop operations through host IPC instead of direct stdio
|
|
1882
|
+
- `src/native-desktop.ts` — 11 platform-guarded code paths delegate to host on macOS
|
|
1883
|
+
- `src/index.ts` — start/stop commands manage host app lifecycle
|
|
1884
|
+
- `native/ClawdCursor.app/Contents/Info.plist` — bundle identifier changed to `com.clawdcursor.app`, executable to `ClawdCursorHost`
|
|
1885
|
+
|
|
1886
|
+
### Unchanged
|
|
1887
|
+
- **Windows/Linux** — all macOS code behind `IS_MAC && this.helper` guards; no behavior changes on other platforms
|
|
1888
|
+
- **172 tests pass** — full test suite unchanged
|
|
1889
|
+
|
|
1890
|
+
## [0.6.3] - 2026-03-01 — Universal Pipeline, Multi-App Workflows, Provider-Agnostic
|
|
1891
|
+
|
|
1892
|
+
### Added
|
|
1893
|
+
- **LLM-based universal task pre-processor** — one cheap text LLM call decomposes any natural language into `{app, navigate, task, contextHints}`, replacing brittle regex parsing
|
|
1894
|
+
- **Multi-app workflow support** — copy/paste between apps (e.g. Wikipedia → Notepad) with 6-checkpoint tracking: first_app_focused → first_app_action_done → content_copied → second_app_opened → content_pasted → result_visible
|
|
1895
|
+
- **Site-specific keyboard shortcuts** — Reddit (j/k/a/c), Twitter/X (j/k/l/t/r), YouTube (Space/f/m), Gmail (j/k/e/r/c), GitHub (s/t/l), Slack (Ctrl+k), plus generic hints
|
|
1896
|
+
- **OS-level default browser detection** — reads Windows registry (HKCU ProgId) or macOS LaunchServices instead of hardcoded Edge/Safari
|
|
1897
|
+
- **3 verification retries with step log analysis** — when verification fails, builds a digest of recent actions + checkpoint status so the vision LLM can fix the specific missed step
|
|
1898
|
+
- **Mixed-provider pipeline support** — e.g. kimi for text, anthropic for Computer Use, with per-layer API key resolution from OpenClaw auth-profiles
|
|
1899
|
+
- **`ComputerUseOverrides` interface** — apiKey, model, baseUrl per-layer for mixed-provider setups
|
|
1900
|
+
- **`resolveProviderApiKey()` helper** — reads OpenClaw auth-profiles to find the right API key per provider
|
|
1901
|
+
|
|
1902
|
+
### Fixed
|
|
1903
|
+
- **Checkpoint system overhaul** — removed auto-termination (completionRatio ≥ 0.90 early exit and isComplete() mid-loop kill), strict detection: content_pasted requires Ctrl+V, content_copied requires Ctrl+C, second_app_opened detects any window switch universally
|
|
1904
|
+
- **Pipeline context passing** — `priorContext[]` accumulator flows from pre-processing through to Computer Use (no more amnesia between layers)
|
|
1905
|
+
- **Credential resolution order** — .clawdcursor-config → auth-profiles.json → openclaw.json (with template expansion) → env vars
|
|
1906
|
+
- **`loadPipelineConfig()` path resolution** — checks package dir first, then cwd (fixes global npm installs)
|
|
1907
|
+
- **Smart Interaction model lookup** — uses `PROVIDERS` registry instead of hardcoded model/baseUrl maps; fixes stale `claude-haiku-3-5-20241022` fallback
|
|
1908
|
+
- **Scroll behavior** — system prompts instruct PageDown/Space instead of tiny mouse scrolls; default scroll delta 3 → 15
|
|
1909
|
+
- **Provider-agnostic internals** — all comments and logs say "vision LLM" instead of "Claude"
|
|
1910
|
+
- **Verification retry limit** — max 3 retries prevents infinite verification loops
|
|
1911
|
+
- **Universal checkpoint detection** — no hardcoded app lists; `detectTaskType()` uses action patterns only
|
|
1912
|
+
|
|
1913
|
+
### Changed
|
|
1914
|
+
- Pipeline architecture: LLM Pre-processor → Pre-open app + navigate → L0 Browser → L1 Action Router + Shortcuts → L1.5 Smart Interaction → L2 A11y Reasoner → L3 Computer Use
|
|
1915
|
+
- Pre-processor prompt hardened with NEVER rules (never summarize, never drop steps) and VALIDATION RULE
|
|
1916
|
+
- MULTI-APP WORKFLOWS section added to both Mac and Windows Computer Use system prompts
|
|
1917
|
+
- Checkpoint thresholds tightened: early completion 75% → 90%, skip-verification 50% → 80%
|
|
1918
|
+
|
|
1919
|
+
## [0.6.5] - 2026-02-28 — Checkpoint System, Task Completion Detection
|
|
1920
|
+
|
|
1921
|
+
### Added
|
|
1922
|
+
- **Checkpoint-based task completion** — Computer Use tracks milestones (compose opened → fields filled → send pressed → compose closed) and stops when all checkpoints are met. No more wasted calls after successful completion.
|
|
1923
|
+
- **Task type detection** — auto-classifies tasks (email, form, navigate, draw, file_save) and applies appropriate checkpoint templates.
|
|
1924
|
+
- **Smart early termination** — when Claude says "done" and ≥75% checkpoints confirmed, accepts completion immediately.
|
|
1925
|
+
- **Auto-config on first run** — `clawdcursor start` auto-detects providers without needing `clawdcursor doctor`.
|
|
1926
|
+
- **Universal provider support** — any OpenAI-compatible endpoint works via `--base-url`.
|
|
1927
|
+
- **CLI model selection** — `--text-model` and `--vision-model` flags.
|
|
1928
|
+
|
|
1929
|
+
### Fixed
|
|
1930
|
+
- **Email domain extraction bug** — "send to user@hotmail.com" no longer navigates to hotmail.com. Email addresses are stripped before URL matching.
|
|
1931
|
+
- **Verification override bug** — verification no longer contradicts confirmed checkpoint completion. Skipped when ≥50% checkpoints met.
|
|
1932
|
+
- **Context loss between layers** — Computer Use now receives full context of what pre-processing already did.
|
|
1933
|
+
- **Drawing quality** — minimum 50px drag distances enforced via system prompt.
|
|
1934
|
+
- **OpenClaw credential discovery** — multi-provider scan, template variable resolution, no false overrides.
|
|
1935
|
+
- **Pipeline gate** — Action Router always runs, shortcuts work everywhere.
|
|
1936
|
+
|
|
1937
|
+
### Changed
|
|
1938
|
+
- Pipeline pre-processes "open X and Y" tasks — opens app via Action Router (free), then hands remaining task to deeper layers.
|
|
1939
|
+
- Smart Interaction detects visual loop tasks (draw, paint) and skips to Computer Use.
|
|
1940
|
+
- Computer Use system prompt includes Snap Assist handling and drawing guidelines.
|
|
1941
|
+
|
|
1942
|
+
## [0.6.2] - 2026-02-28 — Universal Provider Support, Auto-Config
|
|
1943
|
+
|
|
1944
|
+
### Added
|
|
1945
|
+
- **Auto-config on first run** — `clawdcursor start` auto-detects and configures providers without needing `clawdcursor doctor` first. Doctor is now optional for fine-tuning.
|
|
1946
|
+
- **Universal provider support** — any OpenAI-compatible endpoint works. Not limited to 7 hardcoded providers. Use `--base-url` + `--api-key` for custom endpoints.
|
|
1947
|
+
- **CLI model selection** — `--text-model` and `--vision-model` flags on start command.
|
|
1948
|
+
- **Dynamic OpenClaw provider mapping** — reads ALL providers from OpenClaw config, not just known ones. NVIDIA, Fireworks, Mistral, etc. work automatically.
|
|
1949
|
+
|
|
1950
|
+
### Changed
|
|
1951
|
+
- `clawdcursor start` now auto-runs setup if no config exists (non-interactive)
|
|
1952
|
+
- Provider detection accepts any provider name, falling back to OpenAI-compatible API
|
|
1953
|
+
- `detectProvider()` returns 'generic' for unknown providers instead of defaulting to 'openai'
|
|
1954
|
+
|
|
1955
|
+
## [0.6.1] - 2026-02-28 — Keyboard Shortcuts, Pipeline Fixes
|
|
1956
|
+
|
|
1957
|
+
### Added
|
|
1958
|
+
- **Keyboard shortcuts registry** (`src/shortcuts.ts`) — 30+ common actions mapped to direct keystrokes. Scroll, copy, paste, undo, reddit upvote/downvote, browser shortcuts, and more. Zero LLM calls.
|
|
1959
|
+
- **Fuzzy shortcut matching** — "scroll the page down" fuzzy-matches to scroll-down shortcut. Context-aware matching for social media actions.
|
|
1960
|
+
- **Router telemetry** — Action Router now logs match type, confidence, and shortcut hits.
|
|
1961
|
+
- **CDP→UIDriver fallback** — Smart Interaction falls back to accessibility tree automation when browser CDP path fails.
|
|
1962
|
+
- **Gmail, Outlook, Hotmail** added to Browser Layer site map.
|
|
1963
|
+
|
|
1964
|
+
### Fixed
|
|
1965
|
+
- **Pipeline gate bug** — Action Router was gated behind `!isBrowserTask`, causing shortcuts to be skipped for browser-context tasks (e.g., "reddit upvote" matched browser regex but should use shortcut). Action Router now always runs after Browser Layer.
|
|
1966
|
+
- **URL extraction false positives** — "open gmail and send email to foo@bar.com" no longer extracts `bar.com`. URL extraction now isolates the navigation clause before matching.
|
|
1967
|
+
- **Reliable force-stop** — `clawdcursor stop` now force-kills lingering processes via PID file.
|
|
1968
|
+
- **Provider label inference** — startup logs now clearly show text and vision provider names separately.
|
|
1969
|
+
|
|
1970
|
+
### Changed
|
|
1971
|
+
- Pipeline order: Browser Layer (L0) → Action Router + Shortcuts (L1) → Smart Interaction (L1.5) → A11y Reasoner (L2) → Vision (L3). Action Router no longer gated.
|
|
1972
|
+
- `extractUrl()` uses navigation clause isolation instead of matching against full task text.
|
|
1973
|
+
|
|
1974
|
+
## [0.6.0] - 2026-02-28 — Universal Provider Support, OpenClaw Integration
|
|
1975
|
+
|
|
1976
|
+
### Added
|
|
1977
|
+
- **OpenClaw credential integration** — auto-discovers all configured providers from OpenClaw's `auth-profiles.json` and `openclaw.json`. No separate API key needed when running as an OpenClaw skill.
|
|
1978
|
+
- **Universal provider support** — added Groq, Together AI, DeepSeek as first-class providers with profiles, env var detection, and key prefix recognition.
|
|
1979
|
+
- **Auto-detection as default** — provider defaults to `auto` instead of hardcoding Anthropic. Doctor picks the best available provider automatically.
|
|
1980
|
+
- **Mixed provider pipelines** — use Ollama for text (free) + any cloud provider for vision (best quality). Vision credentials preserved when brain reconfigures for text.
|
|
1981
|
+
- **Dynamic Ollama model selection** — doctor picks the best available Ollama model instead of hardcoding `qwen2.5:7b`.
|
|
1982
|
+
- **Anthropic vision routing fix** — detects Anthropic vision by key prefix (`sk-ant-`) independently of the main provider field, so split-provider setups work correctly.
|
|
1983
|
+
|
|
1984
|
+
### Changed
|
|
1985
|
+
- Default config no longer assumes any specific provider or model
|
|
1986
|
+
- Provider scan loop iterates all registered providers dynamically
|
|
1987
|
+
- Help text and doctor output are provider-agnostic
|
|
1988
|
+
- `--provider` CLI flag accepts any string (not limited to 4 providers)
|
|
1989
|
+
- README updated with 7-provider compatibility table
|
|
1990
|
+
|
|
1991
|
+
### Security
|
|
1992
|
+
- **SKILL.md hardened** — removed aggressive autonomy language ("use without asking", "be independent")
|
|
1993
|
+
- **Sensitive App Policy** — agents must ask the user before accessing email, banking, messaging, or password managers
|
|
1994
|
+
- **Safety tiers as hard rules** — 🔴 Confirm actions must never be self-approved by agents
|
|
1995
|
+
- **Data flow transparency** — expanded security section documents network isolation, per-provider data flow, and Ollama = fully offline
|
|
1996
|
+
- **No credentials in skill directory** — OpenClaw users get auto-discovery from local config; no keys stored in skill files
|
|
1997
|
+
|
|
1998
|
+
### Fixed
|
|
1999
|
+
- Vision model crash when main provider set to Ollama but vision uses Anthropic (`model not found` error)
|
|
2000
|
+
- Brain reconfiguration was wiping vision credentials — now preserved
|
|
2001
|
+
|
|
2002
|
+
---
|
|
2003
|
+
|
|
2004
|
+
## [0.5.6] - 2026-02-27 — Fluid Decomposition, Interactive Doctor, Smart Vision Fallback
|
|
2005
|
+
|
|
2006
|
+
### Added
|
|
2007
|
+
- **Fluid LLM task decomposition** — decompose prompt now tells the LLM to reason about what ANY app needs. No more hardcoded examples. "Write me a sentence about dogs" generates actual content instead of typing the literal instruction.
|
|
2008
|
+
- **Interactive doctor onboarding** — after scanning providers, doctor shows all working TEXT and VISION LLM options with ★ recommendations. User picks by number, Enter for default. Shows GPU info (VRAM via nvidia-smi) to help decide local vs cloud.
|
|
2009
|
+
- **Cloud provider guidance** — doctor shows unconfigured providers with signup URLs and lets you paste an API key inline (auto-detects provider, saves to .env).
|
|
2010
|
+
- **Smart vision fallback for compound tasks** — when Router or Reasoner handles part of a multi-step task but fails midway, ALL remaining subtasks are bundled and handed to Computer Use (vision). Prevents false-success trapping in cheap layers.
|
|
2011
|
+
- **Ollama auto-detection** — brain auto-reconfigures to use local Ollama for decomposition when no cloud API key is set. `hasApiKey` now recognizes local LLMs.
|
|
2012
|
+
- **Compound task guard** — action router detects multi-step/compound tasks (commas, "then", "and then") and skips to deeper layers.
|
|
2013
|
+
|
|
2014
|
+
### Fixed
|
|
2015
|
+
- **Case-preserving action router** — all regex matches against raw (unmodified) task text. Typed text and URLs no longer get lowercased.
|
|
2016
|
+
- **Flexible click matching** — `click Blank document` works without quotes (was requiring `click "Blank document"`). Single unified regex for quoted and unquoted element names.
|
|
2017
|
+
- **PowerShell encoding** — replaced emoji (🐾) and em dash (—) in task console title that broke on Windows PowerShell due to encoding.
|
|
2018
|
+
- **Stale config** — `.clawdcursor-config.json` now correctly reflects Ollama when doctor detects it (was stuck on Anthropic).
|
|
2019
|
+
- **Brain provider mismatch** — decomposition no longer calls Anthropic API when only Ollama is available.
|
|
2020
|
+
|
|
2021
|
+
### Changed
|
|
2022
|
+
- **`npm run setup`** — new script that builds and registers `clawdcursor` as a global command via `npm link`. Works on Windows, macOS, and Linux.
|
|
2023
|
+
- **Stop/kill port validation** — port input is now sanitized (parseInt + range check 1-65535) to prevent command injection
|
|
2024
|
+
- **Kill health verification** — kill command now verifies `/health` returns a Clawd Cursor response before force-killing
|
|
2025
|
+
- **Install instructions updated** — README and docs now use `npm run setup`
|
|
2026
|
+
|
|
2027
|
+
### Test Results
|
|
2028
|
+
| Task | Pipeline Path | Steps | LLM Calls | Time | Result |
|
|
2029
|
+
|------|--------------|-------|-----------|------|--------|
|
|
2030
|
+
| Open Notepad | Action Router | 1 | 0 | 1.5s | ✅ |
|
|
2031
|
+
| Open Notepad + write haiku | Router → Smart Interaction → Computer Use | 6 | 7 | 58.8s | ✅ Verified |
|
|
2032
|
+
| Open Google Doc in Edge + write sentence | Browser → Computer Use | 17 | 9 | 78.8s | ✅ Verified |
|
|
2033
|
+
|
|
2034
|
+
## [0.5.5] - 2026-02-26 — Install/Uninstall, OpenClaw Auto-Registration, Doctor UX
|
|
2035
|
+
|
|
2036
|
+
### Added
|
|
2037
|
+
- **`clawdcursor install`** — one command to set up API key, configure pipeline, and register as OpenClaw skill
|
|
2038
|
+
- **`clawdcursor uninstall`** — clean removal of all config, data, and OpenClaw skill registration
|
|
2039
|
+
- **Doctor auto-registers as OpenClaw skill** — symlinks into `~/.openclaw/workspace/skills/clawdcursor`
|
|
2040
|
+
- **Doctor quick fix commands** — shows exact commands for missing text LLM and vision LLM in summary
|
|
2041
|
+
- **Dashboard favorites** — star commands to save them, click to re-run, persists across server restarts
|
|
2042
|
+
- **Credential detection** — warns when starring tasks that contain API keys or passwords
|
|
2043
|
+
- **OS tabs on website** — Windows/macOS/Linux with auto-detect
|
|
2044
|
+
- **Post-build help message** — shows all available commands after `npm run build`
|
|
2045
|
+
- **Dynamic OS detection** — system prompt uses actual OS instead of hardcoded "Windows 11" (thanks @molty)
|
|
2046
|
+
|
|
2047
|
+
### Fixed
|
|
2048
|
+
- **Windows skill detection** — removed `requires.bins` from SKILL.md; OpenClaw's `hasBinary()` doesn't handle Windows PATHEXT (`.exe`/`.cmd`), causing the skill to show as "missing" even when node is installed
|
|
2049
|
+
|
|
2050
|
+
### Changed
|
|
2051
|
+
- **SKILL.md rewritten** — agent identity shift framing, trigger lists, CDP direct path, async polling, error recovery
|
|
2052
|
+
- **Security hardened** — agents cannot self-approve confirm-tier actions, autonomous use scoped to read-only
|
|
2053
|
+
- **Privacy language clarified** — explicit per-provider data flow
|
|
2054
|
+
- **Website Get Started simplified** — 3 lines, commands shown in terminal post-build
|
|
2055
|
+
- **Anthropic text model updated** — `claude-haiku-4-5` (was `claude-3-5-haiku-20241022`)
|
|
2056
|
+
|
|
2057
|
+
## [0.5.4] - 2026-02-25 — SKILL.md Rewrite + Security Hardening
|
|
2058
|
+
|
|
2059
|
+
### Changed
|
|
2060
|
+
- **Privacy language clarified** — explicit per-provider data flow (Ollama = fully local, cloud = data to that API only)
|
|
2061
|
+
- **Added homepage and source URLs** to skill metadata
|
|
2062
|
+
- **Removed hard-coded paths** from SKILL.md
|
|
2063
|
+
- **Security section expanded** — includes localhost bind verification command
|
|
2064
|
+
- **Security scan addressed** — all flagged documentation gaps resolved
|
|
2065
|
+
|
|
2066
|
+
## [0.5.3] - 2026-02-25 — SKILL.md Rewrite for Agent Autonomy
|
|
2067
|
+
|
|
2068
|
+
### Changed
|
|
2069
|
+
- **SKILL.md rewritten** — agents now understand they have full desktop control and stop asking users to do things they can do themselves
|
|
2070
|
+
- **Agent identity shift framing** — blockquote at top overrides default "I can't do desktop things" behavior
|
|
2071
|
+
- **"When to Use This" trigger list** — comprehensive decision framework for when to reach for Clawd Cursor
|
|
2072
|
+
- **Two paths documented** — REST API (port 3847) for full desktop control, CDP Direct (port 9222) for fast browser reads
|
|
2073
|
+
- **Async flow clarified** — concrete polling pattern agents can follow step-by-step
|
|
2074
|
+
- **Error recovery table** — 8 common problems with exact solutions
|
|
2075
|
+
- **Expanded task examples** — cross-app workflows, data extraction, verification scenarios
|
|
2076
|
+
- **README** — added OpenClaw Integration section
|
|
2077
|
+
|
|
2078
|
+
## [0.5.2] - 2026-02-25 — Web Dashboard + Browser Foreground Focus
|
|
2079
|
+
|
|
2080
|
+
### Added
|
|
2081
|
+
- **Web Dashboard** — full single-page UI served at `GET /` (port 3847). Task submission, real-time logs, status indicators, approve/reject for safety confirmations, kill switch. Dark theme, fully responsive, zero external dependencies.
|
|
2082
|
+
- **`clawdcursor dashboard`** — CLI command to open the dashboard in your default browser
|
|
2083
|
+
- **`clawdcursor kill`** — CLI command to send a stop signal to the running server
|
|
2084
|
+
- **`GET /logs`** — API endpoint returning last 200 log entries with timestamps and levels
|
|
2085
|
+
- **Browser foreground focus** — Playwright navigation now brings Chrome to the front via `page.bringToFront()` + OS-level window activation (PowerShell `SetForegroundWindow` on Windows, `osascript` on macOS). The AI acts like a visible cursor — you see everything it does.
|
|
2086
|
+
- **Console hook** — `hookConsole()` intercepts all server logs for the dashboard log feed with auto-classification (error/success/warn/info)
|
|
2087
|
+
|
|
2088
|
+
### Changed
|
|
2089
|
+
- **Smart task handoff** — Browser layer no longer uses regex word lists to detect multi-step tasks. Pure navigation ("open youtube") completes in browser layer; anything more complex falls through to SmartInteraction where the LLM plans the steps. No more missed verbs.
|
|
2090
|
+
|
|
2091
|
+
### Architecture
|
|
2092
|
+
```
|
|
2093
|
+
Layer 0: Browser (Playwright) — navigate + foreground focus
|
|
2094
|
+
↓ more than navigation? → fall through
|
|
2095
|
+
Layer 1: Action Router — regex patterns, zero LLM calls
|
|
2096
|
+
↓ no match? → fall through
|
|
2097
|
+
Layer 1.5: Smart Interaction — 1 LLM call plans steps, CDP/UIDriver executes
|
|
2098
|
+
↓ failed? → fall through
|
|
2099
|
+
Layer 2: Accessibility Reasoner — reads UI tree, cheap LLM
|
|
2100
|
+
↓ failed? → fall through
|
|
2101
|
+
Layer 3: Screenshot + Vision — full screenshot, Computer Use API
|
|
2102
|
+
```
|
|
2103
|
+
|
|
2104
|
+
## [0.5.1] - 2026-02-23 — HD Screenshots + Focus Stability
|
|
2105
|
+
|
|
2106
|
+
### Fixed
|
|
2107
|
+
- **HD screenshots** — LLM resolution increased from 1024px to 1280px (scale 2x instead of 2.5x). Claude can now reliably identify toolbar icons, buttons, and small UI elements.
|
|
2108
|
+
- **JPEG quality** — bumped from 55 to 65 for clearer icon identification
|
|
2109
|
+
- **Window focus stability** — `Win+D` minimizes all windows before task execution, preventing the Clawd terminal from stealing focus from target apps
|
|
2110
|
+
- **Paint drawing reliability** — pencil tool guidance in system prompt, mandatory checkpoint after tool selection
|
|
2111
|
+
- **Stale file cleanup** — restored `get-windows.ps1` shim (still referenced by accessibility.ts), removed dead `setup.ps1` and `get-ui-tree.ps1`
|
|
2112
|
+
|
|
2113
|
+
### Performance (Paint stickman benchmark)
|
|
2114
|
+
| Metric | v0.5.0 | v0.5.1 |
|
|
2115
|
+
|--------|--------|--------|
|
|
2116
|
+
| Time | ~250s | **55s** |
|
|
2117
|
+
| API calls | 30 | **6** |
|
|
2118
|
+
| Success rate | ~50% | ~90% |
|
|
2119
|
+
|
|
2120
|
+
## [0.5.0] - 2026-02-23 — Smart Pipeline + Doctor + Batch Execution
|
|
2121
|
+
|
|
2122
|
+
### Added
|
|
2123
|
+
- **`clawdcursor doctor`** — auto-diagnoses setup, tests models, configures optimal pipeline
|
|
2124
|
+
- **3-layer pipeline** — Action Router → Accessibility Reasoner → Screenshot fallback
|
|
2125
|
+
- **Layer 2: Accessibility Reasoner** (`src/a11y-reasoner.ts`) — text-only LLM reads the UI tree, no screenshots needed. Uses cheap models (Haiku, Qwen, GPT-4o-mini).
|
|
2126
|
+
- **Batch action execution** — Claude returns multiple actions per response (3.6 avg), skipping screenshots between batched actions. Drawing tasks execute 10+ actions in a single API call.
|
|
2127
|
+
- **Focus hints** — each screenshot includes a FOCUS directive telling Claude where to look, reducing output tokens and decision time
|
|
2128
|
+
- **Auto-maximize** — apps launched via Action Router are automatically maximized (`Win+Up`) for consistent layout
|
|
2129
|
+
- **Region capture** — `captureRegionForLLM()` crops screenshots to specific areas (2-30KB vs 58KB full)
|
|
2130
|
+
- **Checkpoint strategy** — screenshots only after critical state changes (app open, dialog appear), not after every action
|
|
2131
|
+
- **Multi-provider support** — Anthropic, OpenAI, Ollama (local/free), Kimi. Same codebase, auto-detected.
|
|
2132
|
+
- **Provider model map** (`src/providers.ts`) — auto-selects cheap/expensive models per provider
|
|
2133
|
+
- **Self-healing** — doctor falls back if a model is unavailable (e.g., Haiku → Qwen). Circuit breaker disables failing layers at runtime.
|
|
2134
|
+
- **Streaming LLM responses** — early JSON return saves 1-3s per call
|
|
2135
|
+
- **Combined accessibility script** (`scripts/get-screen-context.ps1`) — 1 PowerShell spawn instead of 3
|
|
2136
|
+
- **Benchmark harness** (`test-perf-comparison.ts`)
|
|
2137
|
+
|
|
2138
|
+
### Performance
|
|
2139
|
+
- Screenshots: 120KB → ~80KB, 1280px target (HD for reliable icon identification)
|
|
2140
|
+
- JPEG quality: 70 → 65
|
|
2141
|
+
- Delays: 200-1500ms → 50-600ms across the board
|
|
2142
|
+
- System prompts: ~60% smaller (fewer tokens per call)
|
|
2143
|
+
- Accessibility tree: filtered to interactive elements only, 3000 char cap
|
|
2144
|
+
- Taskbar cache: 30s TTL (was queried every call)
|
|
2145
|
+
- Screen context cache: 500ms → 2s TTL
|
|
2146
|
+
|
|
2147
|
+
### Benchmarks
|
|
2148
|
+
|
|
2149
|
+
| Task | v0.4 | v0.5 (Ollama, $0) | v0.5 (Anthropic) | v0.5 + Batch |
|
|
2150
|
+
|------|------|--------|---------|---------|
|
|
2151
|
+
| Calculator | 43s | 2.6s | 20.1s | — |
|
|
2152
|
+
| Notepad | 73s | 2.0s | 54.2s | — |
|
|
2153
|
+
| File Explorer | 53s | 1.9s | 22.1s | — |
|
|
2154
|
+
| Paint stickman | ~250s (30 calls) | — | ~124s (19 calls) | **101s (11 calls)** |
|
|
2155
|
+
| GitHub profile | — | — | ~106s (15 calls) | — |
|
|
2156
|
+
|
|
2157
|
+
## [0.4.0] - 2026-02-22 — Native Desktop Control
|
|
2158
|
+
|
|
2159
|
+
**VNC removed.** Clawd Cursor now controls the desktop natively via @nut-tree-fork/nut-js. No VNC server required.
|
|
2160
|
+
|
|
2161
|
+
### Breaking Changes
|
|
2162
|
+
- `--vnc-host`, `--vnc-port`, `--vnc-password` CLI flags removed
|
|
2163
|
+
- `VNC_PASSWORD`, `VNC_HOST`, `VNC_PORT` environment variables no longer used
|
|
2164
|
+
- `rfb2` dependency removed
|
|
2165
|
+
- `setup.ps1` no longer installs TightVNC
|
|
2166
|
+
|
|
2167
|
+
### Added
|
|
2168
|
+
- `NativeDesktop` class (`src/native-desktop.ts`) — drop-in replacement for VNCClient
|
|
2169
|
+
- Direct screen capture via @nut-tree-fork/nut-js (~50ms vs ~850ms)
|
|
2170
|
+
- Direct mouse/keyboard control via OS-level APIs
|
|
2171
|
+
- Simplified onboarding: `npm install && npm start`
|
|
2172
|
+
|
|
2173
|
+
### Performance
|
|
2174
|
+
- Screenshots: ~850ms → ~50ms (17× faster)
|
|
2175
|
+
- Connect time: ~200ms → ~38ms (5× faster)
|
|
2176
|
+
- Simple task (Google Docs sentence): ~120s → ~102s
|
|
2177
|
+
- Complex task (GitHub → Notepad → save): ~200s → ~156s
|
|
2178
|
+
|
|
2179
|
+
### Removed
|
|
2180
|
+
- VNC server dependency (TightVNC)
|
|
2181
|
+
- `rfb2` npm package
|
|
2182
|
+
- VNC-related CLI flags and environment variables
|
|
2183
|
+
- BGRA→RGBA color swap (nut-js returns RGBA natively)
|
|
2184
|
+
|
|
2185
|
+
## [0.3.3] - 2025-03-15
|
|
2186
|
+
|
|
2187
|
+
### Bulletproof Headless Setup
|
|
2188
|
+
- setup.ps1 now completes end-to-end in a single run on fresh systems, even in non-interactive/headless AI agent shells
|
|
2189
|
+
- Generate random VNC password when `--vnc-password` not provided non-interactively
|
|
2190
|
+
- Replace `Start-Process -NoNewWindow -Wait` with `-PassThru -WindowStyle Hidden` + try/catch (msiexec crash fix)
|
|
2191
|
+
- Wrap `Start-Service` in its own try/catch (post-install crash fix)
|
|
2192
|
+
- Replace all emoji with ASCII tags for cp1252 headless terminal compatibility
|
|
2193
|
+
|
|
2194
|
+
## [0.3.1] - 2025-03-10
|
|
2195
|
+
|
|
2196
|
+
### SKILL.md Security Hardening
|
|
2197
|
+
- Added YAML frontmatter, explicit credential declarations, privacy disclosure, and security considerations for ClaWHub publishing.
|
|
2198
|
+
|
|
2199
|
+
## [0.3.0] - 2025-03-01
|
|
2200
|
+
|
|
2201
|
+
### Performance Optimizations (~70% faster)
|
|
2202
|
+
- Screenshot hash cache — skips LLM calls when the screen hasn't changed
|
|
2203
|
+
- Adaptive VNC frame wait — captures in ~200ms instead of fixed 800ms
|
|
2204
|
+
- Parallel screenshot + accessibility fetch — runs concurrently via Promise.all
|
|
2205
|
+
- Accessibility context cache — 500ms TTL eliminates redundant PowerShell queries
|
|
2206
|
+
- Async debug writes — no longer blocks the event loop
|
|
2207
|
+
- Exponential backoff with jitter — better retry resilience for API calls
|
|
2208
|
+
|
|
2209
|
+
## [0.2.0] - 2025-02-21
|
|
2210
|
+
|
|
2211
|
+
### 🚀 Major: Anthropic Computer Use API
|
|
2212
|
+
|
|
2213
|
+
Clawd Cursor now supports Anthropic's native Computer Use API (`computer_20250124`) as the **primary execution path**. This is a fundamentally different approach — the full task goes directly to Claude with native computer use tools. No decomposition, no routing. Claude sees screenshots, plans, and executes natively.
|
|
2214
|
+
|
|
2215
|
+
### Dual Execution Paths
|
|
2216
|
+
|
|
2217
|
+
The agent now has two separate code paths selected by provider:
|
|
2218
|
+
|
|
2219
|
+
- **Path A — Computer Use API** (`--provider anthropic`): Full task sent to Claude with `computer_20250124` tool. Claude sees the screen, plans multi-step sequences, and executes them natively. Handles complex, multi-app workflows reliably.
|
|
2220
|
+
- **Path B — Decompose + Action Router** (`--provider openai` / offline): Original approach from v0.1.0. Parse task → subtasks → Action Router (UI Automation, zero LLM) → Vision fallback. Faster and cheaper for simple tasks, works without an API key.
|
|
2221
|
+
|
|
2222
|
+
### Added
|
|
2223
|
+
|
|
2224
|
+
- **Anthropic Computer Use integration** — native `computer_20250124` tool type with `anthropic-beta: computer-use-2025-01-24` header
|
|
2225
|
+
- **Adaptive delays** — per-action timing: 1000ms for app launch, 800ms for navigation, 100ms for typing, 300ms default
|
|
2226
|
+
- **Verification hints** — post-action verification prompts after each Computer Use step
|
|
2227
|
+
- **Mouse drag** — `mouseDrag`, `mouseDown`, `mouseUp` with smooth interpolation between points
|
|
2228
|
+
- **Bulletproof system prompt** — planning rules, ctrl+l for URL navigation, recovery strategies for failed actions
|
|
2229
|
+
- **Display scaling** — automatic resolution scaling to 1280×720 for Computer Use API compatibility
|
|
2230
|
+
- **Vision model** — `claude-sonnet-4-20250514` for Computer Use path
|
|
2231
|
+
|
|
2232
|
+
### Test Results
|
|
2233
|
+
|
|
2234
|
+
| Task | Time | API Calls | Result |
|
|
2235
|
+
|------|------|-----------|--------|
|
|
2236
|
+
| Google Docs: open Chrome, go to Docs, write a paragraph | 187s | 14 | ✅ All succeeded |
|
|
2237
|
+
| GitHub: open Chrome, navigate to profile, screenshot | 102s | — | ✅ All succeeded |
|
|
2238
|
+
| Notepad: open, write haiku, save to desktop | ~180s | — | ✅ File saved correctly |
|
|
2239
|
+
| Paint: draw a stick figure | ~90s | 16 | ✅ Drawing completed |
|
|
2240
|
+
|
|
2241
|
+
### Breaking Changes
|
|
2242
|
+
|
|
2243
|
+
- **Provider selection now determines execution path.** `--provider anthropic` uses Computer Use API (Path A). `--provider openai` or no provider uses the original Decompose + Action Router pipeline (Path B). This is a fundamental change in behavior — the same task will execute via completely different code paths depending on the provider.
|
|
2244
|
+
|
|
2245
|
+
### Performance Characteristics
|
|
2246
|
+
|
|
2247
|
+
| | Path A (Computer Use) | Path B (Action Router) |
|
|
2248
|
+
|---|---|---|
|
|
2249
|
+
| Best for | Complex multi-step tasks | Simple single-action tasks |
|
|
2250
|
+
| Reliability | Very high | Good for supported patterns |
|
|
2251
|
+
| Speed | ~90–190s for complex tasks | ~2s for simple tasks |
|
|
2252
|
+
| Cost | Higher (multiple API calls with screenshots) | Lower (1 text call or zero) |
|
|
2253
|
+
| Offline | No | Yes (for common patterns) |
|
|
2254
|
+
|
|
2255
|
+
## [0.1.0] - 2025-01-15
|
|
2256
|
+
|
|
2257
|
+
### Initial Release
|
|
2258
|
+
|
|
2259
|
+
- Action Router with Windows UI Automation — 80% of common tasks with zero LLM calls
|
|
2260
|
+
- Vision fallback for complex/unfamiliar UI
|
|
2261
|
+
- Smart task decomposition (single text-only LLM call)
|
|
2262
|
+
- Three-tier safety system (Auto / Preview / Confirm)
|
|
2263
|
+
- REST API and CLI interface
|
|
2264
|
+
- Windows setup script
|