@mseep/clawdcursor 1.5.5

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (354) hide show
  1. package/CHANGELOG.md +2264 -0
  2. package/LICENSE +21 -0
  3. package/README.md +385 -0
  4. package/SECURITY.md +44 -0
  5. package/SKILL.md +503 -0
  6. package/dist/core/agent-loop/agent.d.ts +42 -0
  7. package/dist/core/agent-loop/agent.js +1023 -0
  8. package/dist/core/agent-loop/agent.js.map +1 -0
  9. package/dist/core/agent-loop/batch-tool.d.ts +25 -0
  10. package/dist/core/agent-loop/batch-tool.js +218 -0
  11. package/dist/core/agent-loop/batch-tool.js.map +1 -0
  12. package/dist/core/agent-loop/coord-scale.d.ts +72 -0
  13. package/dist/core/agent-loop/coord-scale.js +89 -0
  14. package/dist/core/agent-loop/coord-scale.js.map +1 -0
  15. package/dist/core/agent-loop/focus-guard.d.ts +24 -0
  16. package/dist/core/agent-loop/focus-guard.js +29 -0
  17. package/dist/core/agent-loop/focus-guard.js.map +1 -0
  18. package/dist/core/agent-loop/project-mcp.d.ts +97 -0
  19. package/dist/core/agent-loop/project-mcp.js +253 -0
  20. package/dist/core/agent-loop/project-mcp.js.map +1 -0
  21. package/dist/core/agent-loop/prompt.d.ts +45 -0
  22. package/dist/core/agent-loop/prompt.js +426 -0
  23. package/dist/core/agent-loop/prompt.js.map +1 -0
  24. package/dist/core/agent-loop/tool-meta.d.ts +93 -0
  25. package/dist/core/agent-loop/tool-meta.js +651 -0
  26. package/dist/core/agent-loop/tool-meta.js.map +1 -0
  27. package/dist/core/agent-loop/tools.d.ts +38 -0
  28. package/dist/core/agent-loop/tools.js +2134 -0
  29. package/dist/core/agent-loop/tools.js.map +1 -0
  30. package/dist/core/agent-loop/types.d.ts +170 -0
  31. package/dist/core/agent-loop/types.js +12 -0
  32. package/dist/core/agent-loop/types.js.map +1 -0
  33. package/dist/core/agent.d.ts +51 -0
  34. package/dist/core/agent.js +245 -0
  35. package/dist/core/agent.js.map +1 -0
  36. package/dist/core/app-categories.d.ts +67 -0
  37. package/dist/core/app-categories.js +108 -0
  38. package/dist/core/app-categories.js.map +1 -0
  39. package/dist/core/banner.d.ts +70 -0
  40. package/dist/core/banner.js +245 -0
  41. package/dist/core/banner.js.map +1 -0
  42. package/dist/core/classify/capability.d.ts +45 -0
  43. package/dist/core/classify/capability.js +78 -0
  44. package/dist/core/classify/capability.js.map +1 -0
  45. package/dist/core/decompose/llm-decomposer.d.ts +35 -0
  46. package/dist/core/decompose/llm-decomposer.js +156 -0
  47. package/dist/core/decompose/llm-decomposer.js.map +1 -0
  48. package/dist/core/decompose/parser.d.ts +27 -0
  49. package/dist/core/decompose/parser.js +101 -0
  50. package/dist/core/decompose/parser.js.map +1 -0
  51. package/dist/core/observability/correlation.d.ts +19 -0
  52. package/dist/core/observability/correlation.js +36 -0
  53. package/dist/core/observability/correlation.js.map +1 -0
  54. package/dist/core/observability/cost-meter.d.ts +51 -0
  55. package/dist/core/observability/cost-meter.js +134 -0
  56. package/dist/core/observability/cost-meter.js.map +1 -0
  57. package/dist/core/observability/logger.d.ts +61 -0
  58. package/dist/core/observability/logger.js +550 -0
  59. package/dist/core/observability/logger.js.map +1 -0
  60. package/dist/core/router/aliases.d.ts +50 -0
  61. package/dist/core/router/aliases.js +104 -0
  62. package/dist/core/router/aliases.js.map +1 -0
  63. package/dist/core/router/normalize.d.ts +41 -0
  64. package/dist/core/router/normalize.js +80 -0
  65. package/dist/core/router/normalize.js.map +1 -0
  66. package/dist/core/safety.d.ts +126 -0
  67. package/dist/core/safety.js +568 -0
  68. package/dist/core/safety.js.map +1 -0
  69. package/dist/core/sense/a11y-resolver.d.ts +73 -0
  70. package/dist/core/sense/a11y-resolver.js +76 -0
  71. package/dist/core/sense/a11y-resolver.js.map +1 -0
  72. package/dist/core/sense/fingerprint.d.ts +41 -0
  73. package/dist/core/sense/fingerprint.js +123 -0
  74. package/dist/core/sense/fingerprint.js.map +1 -0
  75. package/dist/core/sense/rank.d.ts +70 -0
  76. package/dist/core/sense/rank.js +192 -0
  77. package/dist/core/sense/rank.js.map +1 -0
  78. package/dist/core/sense/reactive-check.d.ts +40 -0
  79. package/dist/core/sense/reactive-check.js +48 -0
  80. package/dist/core/sense/reactive-check.js.map +1 -0
  81. package/dist/core/sense/snapshot.d.ts +19 -0
  82. package/dist/core/sense/snapshot.js +100 -0
  83. package/dist/core/sense/snapshot.js.map +1 -0
  84. package/dist/core/sense/types.d.ts +66 -0
  85. package/dist/core/sense/types.js +9 -0
  86. package/dist/core/sense/types.js.map +1 -0
  87. package/dist/core/sense/ui-map-anchors.d.ts +7 -0
  88. package/dist/core/sense/ui-map-anchors.js +24 -0
  89. package/dist/core/sense/ui-map-anchors.js.map +1 -0
  90. package/dist/core/sense/ui-map-elements.d.ts +5 -0
  91. package/dist/core/sense/ui-map-elements.js +33 -0
  92. package/dist/core/sense/ui-map-elements.js.map +1 -0
  93. package/dist/core/sense/ui-map-find.d.ts +56 -0
  94. package/dist/core/sense/ui-map-find.js +153 -0
  95. package/dist/core/sense/ui-map-find.js.map +1 -0
  96. package/dist/core/sense/ui-map-fuse.d.ts +4 -0
  97. package/dist/core/sense/ui-map-fuse.js +44 -0
  98. package/dist/core/sense/ui-map-fuse.js.map +1 -0
  99. package/dist/core/sense/ui-map-geom.d.ts +3 -0
  100. package/dist/core/sense/ui-map-geom.js +16 -0
  101. package/dist/core/sense/ui-map-geom.js.map +1 -0
  102. package/dist/core/sense/ui-map-holder.d.ts +58 -0
  103. package/dist/core/sense/ui-map-holder.js +87 -0
  104. package/dist/core/sense/ui-map-holder.js.map +1 -0
  105. package/dist/core/sense/ui-map-normalize.d.ts +19 -0
  106. package/dist/core/sense/ui-map-normalize.js +65 -0
  107. package/dist/core/sense/ui-map-normalize.js.map +1 -0
  108. package/dist/core/sense/ui-map-render.d.ts +4 -0
  109. package/dist/core/sense/ui-map-render.js +34 -0
  110. package/dist/core/sense/ui-map-render.js.map +1 -0
  111. package/dist/core/sense/ui-map-resolve.d.ts +41 -0
  112. package/dist/core/sense/ui-map-resolve.js +59 -0
  113. package/dist/core/sense/ui-map-resolve.js.map +1 -0
  114. package/dist/core/sense/ui-map-types.d.ts +66 -0
  115. package/dist/core/sense/ui-map-types.js +11 -0
  116. package/dist/core/sense/ui-map-types.js.map +1 -0
  117. package/dist/core/sense/ui-map.d.ts +29 -0
  118. package/dist/core/sense/ui-map.js +113 -0
  119. package/dist/core/sense/ui-map.js.map +1 -0
  120. package/dist/core/verify/assertions.d.ts +132 -0
  121. package/dist/core/verify/assertions.js +284 -0
  122. package/dist/core/verify/assertions.js.map +1 -0
  123. package/dist/index.d.ts +21 -0
  124. package/dist/index.js +24 -0
  125. package/dist/index.js.map +1 -0
  126. package/dist/llm/browser-config.d.ts +36 -0
  127. package/dist/llm/browser-config.js +83 -0
  128. package/dist/llm/browser-config.js.map +1 -0
  129. package/dist/llm/client.d.ts +268 -0
  130. package/dist/llm/client.js +1094 -0
  131. package/dist/llm/client.js.map +1 -0
  132. package/dist/llm/config.d.ts +79 -0
  133. package/dist/llm/config.js +375 -0
  134. package/dist/llm/config.js.map +1 -0
  135. package/dist/llm/credentials.d.ts +35 -0
  136. package/dist/llm/credentials.js +491 -0
  137. package/dist/llm/credentials.js.map +1 -0
  138. package/dist/llm/external-creds.d.ts +42 -0
  139. package/dist/llm/external-creds.js +169 -0
  140. package/dist/llm/external-creds.js.map +1 -0
  141. package/dist/llm/providers.d.ts +123 -0
  142. package/dist/llm/providers.js +717 -0
  143. package/dist/llm/providers.js.map +1 -0
  144. package/dist/paths.d.ts +31 -0
  145. package/dist/paths.js +147 -0
  146. package/dist/paths.js.map +1 -0
  147. package/dist/platform/accessibility.d.ts +139 -0
  148. package/dist/platform/accessibility.js +670 -0
  149. package/dist/platform/accessibility.js.map +1 -0
  150. package/dist/platform/cdp-driver.d.ts +318 -0
  151. package/dist/platform/cdp-driver.js +1179 -0
  152. package/dist/platform/cdp-driver.js.map +1 -0
  153. package/dist/platform/index.d.ts +11 -0
  154. package/dist/platform/index.js +69 -0
  155. package/dist/platform/index.js.map +1 -0
  156. package/dist/platform/keys.d.ts +17 -0
  157. package/dist/platform/keys.js +129 -0
  158. package/dist/platform/keys.js.map +1 -0
  159. package/dist/platform/launch-poll.d.ts +101 -0
  160. package/dist/platform/launch-poll.js +177 -0
  161. package/dist/platform/launch-poll.js.map +1 -0
  162. package/dist/platform/linux.d.ts +173 -0
  163. package/dist/platform/linux.js +1253 -0
  164. package/dist/platform/linux.js.map +1 -0
  165. package/dist/platform/macos.d.ts +136 -0
  166. package/dist/platform/macos.js +976 -0
  167. package/dist/platform/macos.js.map +1 -0
  168. package/dist/platform/native-desktop.d.ts +145 -0
  169. package/dist/platform/native-desktop.js +936 -0
  170. package/dist/platform/native-desktop.js.map +1 -0
  171. package/dist/platform/native-helper.d.ts +130 -0
  172. package/dist/platform/native-helper.js +592 -0
  173. package/dist/platform/native-helper.js.map +1 -0
  174. package/dist/platform/ocr-engine.d.ts +78 -0
  175. package/dist/platform/ocr-engine.js +363 -0
  176. package/dist/platform/ocr-engine.js.map +1 -0
  177. package/dist/platform/ps-runner.d.ts +28 -0
  178. package/dist/platform/ps-runner.js +228 -0
  179. package/dist/platform/ps-runner.js.map +1 -0
  180. package/dist/platform/types.d.ts +397 -0
  181. package/dist/platform/types.js +15 -0
  182. package/dist/platform/types.js.map +1 -0
  183. package/dist/platform/uri-handler.d.ts +75 -0
  184. package/dist/platform/uri-handler.js +273 -0
  185. package/dist/platform/uri-handler.js.map +1 -0
  186. package/dist/platform/wayland-backend.d.ts +53 -0
  187. package/dist/platform/wayland-backend.js +348 -0
  188. package/dist/platform/wayland-backend.js.map +1 -0
  189. package/dist/platform/windows.d.ts +232 -0
  190. package/dist/platform/windows.js +1210 -0
  191. package/dist/platform/windows.js.map +1 -0
  192. package/dist/postbuild.d.ts +10 -0
  193. package/dist/postbuild.js +98 -0
  194. package/dist/postbuild.js.map +1 -0
  195. package/dist/schema/snapshot.d.ts +33 -0
  196. package/dist/schema/snapshot.js +90 -0
  197. package/dist/schema/snapshot.js.map +1 -0
  198. package/dist/shortcuts.d.ts +30 -0
  199. package/dist/shortcuts.js +261 -0
  200. package/dist/shortcuts.js.map +1 -0
  201. package/dist/surface/cli.d.ts +7 -0
  202. package/dist/surface/cli.js +1556 -0
  203. package/dist/surface/cli.js.map +1 -0
  204. package/dist/surface/dashboard.d.ts +8 -0
  205. package/dist/surface/dashboard.js +1193 -0
  206. package/dist/surface/dashboard.js.map +1 -0
  207. package/dist/surface/doctor.d.ts +29 -0
  208. package/dist/surface/doctor.js +1514 -0
  209. package/dist/surface/doctor.js.map +1 -0
  210. package/dist/surface/format.d.ts +10 -0
  211. package/dist/surface/format.js +37 -0
  212. package/dist/surface/format.js.map +1 -0
  213. package/dist/surface/http-utility.d.ts +65 -0
  214. package/dist/surface/http-utility.js +336 -0
  215. package/dist/surface/http-utility.js.map +1 -0
  216. package/dist/surface/mcp-server.d.ts +91 -0
  217. package/dist/surface/mcp-server.js +280 -0
  218. package/dist/surface/mcp-server.js.map +1 -0
  219. package/dist/surface/onboarding.d.ts +15 -0
  220. package/dist/surface/onboarding.js +184 -0
  221. package/dist/surface/onboarding.js.map +1 -0
  222. package/dist/surface/pidfile.d.ts +79 -0
  223. package/dist/surface/pidfile.js +263 -0
  224. package/dist/surface/pidfile.js.map +1 -0
  225. package/dist/surface/readiness.d.ts +45 -0
  226. package/dist/surface/readiness.js +230 -0
  227. package/dist/surface/readiness.js.map +1 -0
  228. package/dist/surface/report.d.ts +68 -0
  229. package/dist/surface/report.js +341 -0
  230. package/dist/surface/report.js.map +1 -0
  231. package/dist/surface/skill-register.d.ts +14 -0
  232. package/dist/surface/skill-register.js +150 -0
  233. package/dist/surface/skill-register.js.map +1 -0
  234. package/dist/surface/version.d.ts +6 -0
  235. package/dist/surface/version.js +27 -0
  236. package/dist/surface/version.js.map +1 -0
  237. package/dist/tools/a11y.d.ts +8 -0
  238. package/dist/tools/a11y.js +545 -0
  239. package/dist/tools/a11y.js.map +1 -0
  240. package/dist/tools/a11y_depth.d.ts +19 -0
  241. package/dist/tools/a11y_depth.js +455 -0
  242. package/dist/tools/a11y_depth.js.map +1 -0
  243. package/dist/tools/agent.d.ts +15 -0
  244. package/dist/tools/agent.js +248 -0
  245. package/dist/tools/agent.js.map +1 -0
  246. package/dist/tools/batch.d.ts +46 -0
  247. package/dist/tools/batch.js +230 -0
  248. package/dist/tools/batch.js.map +1 -0
  249. package/dist/tools/cdp.d.ts +8 -0
  250. package/dist/tools/cdp.js +233 -0
  251. package/dist/tools/cdp.js.map +1 -0
  252. package/dist/tools/compact.d.ts +63 -0
  253. package/dist/tools/compact.js +418 -0
  254. package/dist/tools/compact.js.map +1 -0
  255. package/dist/tools/cost-class.d.ts +38 -0
  256. package/dist/tools/cost-class.js +117 -0
  257. package/dist/tools/cost-class.js.map +1 -0
  258. package/dist/tools/desktop.d.ts +9 -0
  259. package/dist/tools/desktop.js +346 -0
  260. package/dist/tools/desktop.js.map +1 -0
  261. package/dist/tools/electron_bridge.d.ts +41 -0
  262. package/dist/tools/electron_bridge.js +261 -0
  263. package/dist/tools/electron_bridge.js.map +1 -0
  264. package/dist/tools/extras.d.ts +22 -0
  265. package/dist/tools/extras.js +942 -0
  266. package/dist/tools/extras.js.map +1 -0
  267. package/dist/tools/favorites.d.ts +13 -0
  268. package/dist/tools/favorites.js +137 -0
  269. package/dist/tools/favorites.js.map +1 -0
  270. package/dist/tools/introspection.d.ts +13 -0
  271. package/dist/tools/introspection.js +55 -0
  272. package/dist/tools/introspection.js.map +1 -0
  273. package/dist/tools/ocr.d.ts +8 -0
  274. package/dist/tools/ocr.js +66 -0
  275. package/dist/tools/ocr.js.map +1 -0
  276. package/dist/tools/orchestration.d.ts +7 -0
  277. package/dist/tools/orchestration.js +377 -0
  278. package/dist/tools/orchestration.js.map +1 -0
  279. package/dist/tools/playbooks/extract-compose.d.ts +22 -0
  280. package/dist/tools/playbooks/extract-compose.js +85 -0
  281. package/dist/tools/playbooks/extract-compose.js.map +1 -0
  282. package/dist/tools/playbooks/find-replace.d.ts +11 -0
  283. package/dist/tools/playbooks/find-replace.js +56 -0
  284. package/dist/tools/playbooks/find-replace.js.map +1 -0
  285. package/dist/tools/playbooks/index.d.ts +63 -0
  286. package/dist/tools/playbooks/index.js +70 -0
  287. package/dist/tools/playbooks/index.js.map +1 -0
  288. package/dist/tools/playbooks/keys-blocklist.d.ts +24 -0
  289. package/dist/tools/playbooks/keys-blocklist.js +89 -0
  290. package/dist/tools/playbooks/keys-blocklist.js.map +1 -0
  291. package/dist/tools/registry.d.ts +40 -0
  292. package/dist/tools/registry.js +560 -0
  293. package/dist/tools/registry.js.map +1 -0
  294. package/dist/tools/safety-gate.d.ts +16 -0
  295. package/dist/tools/safety-gate.js +70 -0
  296. package/dist/tools/safety-gate.js.map +1 -0
  297. package/dist/tools/scheduler.d.ts +76 -0
  298. package/dist/tools/scheduler.js +413 -0
  299. package/dist/tools/scheduler.js.map +1 -0
  300. package/dist/tools/shortcuts.d.ts +13 -0
  301. package/dist/tools/shortcuts.js +205 -0
  302. package/dist/tools/shortcuts.js.map +1 -0
  303. package/dist/tools/smart.d.ts +15 -0
  304. package/dist/tools/smart.js +785 -0
  305. package/dist/tools/smart.js.map +1 -0
  306. package/dist/tools/types.d.ts +174 -0
  307. package/dist/tools/types.js +67 -0
  308. package/dist/tools/types.js.map +1 -0
  309. package/dist/tools/window-text.d.ts +15 -0
  310. package/dist/tools/window-text.js +39 -0
  311. package/dist/tools/window-text.js.map +1 -0
  312. package/dist/types.d.ts +122 -0
  313. package/dist/types.js +41 -0
  314. package/dist/types.js.map +1 -0
  315. package/native/Package.swift +38 -0
  316. package/native/README.md +113 -0
  317. package/native/Sources/ClawdCursorHelper/main.swift +602 -0
  318. package/native/Sources/ClawdCursorHost/main.swift +182 -0
  319. package/native/Sources/PermissionCheck/main.swift +53 -0
  320. package/native/Sources/ScreenshotHelper/main.swift +219 -0
  321. package/native/build.sh +139 -0
  322. package/native/entitlements.plist +12 -0
  323. package/package.json +115 -0
  324. package/scripts/banner.ps1 +112 -0
  325. package/scripts/coord-accuracy.ps1 +140 -0
  326. package/scripts/coord-uwp.ps1 +80 -0
  327. package/scripts/edge-glow.ps1 +180 -0
  328. package/scripts/find-element.ps1 +198 -0
  329. package/scripts/get-foreground-window.ps1 +71 -0
  330. package/scripts/get-screen-context.ps1 +183 -0
  331. package/scripts/get-windows.ps1 +66 -0
  332. package/scripts/install-panic-hotkey.ps1 +46 -0
  333. package/scripts/interact-element.ps1 +431 -0
  334. package/scripts/invoke-element.ps1 +314 -0
  335. package/scripts/linux/atspi-bridge.py +356 -0
  336. package/scripts/linux/ocr-recognize.py +154 -0
  337. package/scripts/mac/_window-picker.jxa +163 -0
  338. package/scripts/mac/find-element.jxa +0 -0
  339. package/scripts/mac/find-element.sh +161 -0
  340. package/scripts/mac/focus-window.jxa +284 -0
  341. package/scripts/mac/get-focused-element.jxa +102 -0
  342. package/scripts/mac/get-foreground-window.jxa +173 -0
  343. package/scripts/mac/get-screen-context.jxa +197 -0
  344. package/scripts/mac/get-ui-tree.sh +141 -0
  345. package/scripts/mac/get-windows.jxa +117 -0
  346. package/scripts/mac/interact-element.sh +235 -0
  347. package/scripts/mac/invoke-element.jxa +408 -0
  348. package/scripts/mac/ocr-recognize.swift +124 -0
  349. package/scripts/ocr-recognize.ps1 +102 -0
  350. package/scripts/postinstall-native.js +48 -0
  351. package/scripts/ps-bridge.ps1 +830 -0
  352. package/scripts/smoke-mcp.ps1 +119 -0
  353. package/scripts/sync-version.ts +178 -0
  354. package/scripts/verify-install.js +81 -0
package/CHANGELOG.md ADDED
@@ -0,0 +1,2264 @@
1
+ # Changelog
2
+
3
+ All notable changes to Clawd Cursor will be documented in this file.
4
+
5
+ ## [1.5.5] - 2026-06-16 — the skill follows the install (cross-framework)
6
+
7
+ ### Fixed
8
+
9
+ - **MCP-direct installs got the tools but not the skill.** The cross-framework
10
+ skill registration (Claude Code, OpenClaw, Codex, Cursor) lived *only* inside
11
+ `clawdcursor doctor` — which the MCP-first onboarding explicitly tells people to
12
+ skip. So an agent connected over MCP saw bare tools with none of the "how to use
13
+ me" knowledge (fallback positioning, the el_NN UI map, sustainable/autonomous
14
+ execution via the daemon + `task`), and clawdcursor stopped appearing as a skill.
15
+ Registration is now extracted into a shared module and runs on **`consent`** (the
16
+ always-required step) and via a new **`clawdcursor register-skill`** command, so
17
+ the skill installs into every detected agent framework regardless of install path.
18
+
19
+ ### Changed
20
+
21
+ - **Richer MCP server `instructions`.** Even an agent with no skill file (a host
22
+ that doesn't support skills) now learns the essentials on connect: drive UI
23
+ symbolically (`compile_ui` / `find_button` / `find_field` → `{element_id,
24
+ snapshot_id}`, survives layout shifts), verify with `expect`, the fallback-only
25
+ positioning, and where to find the full guide (the registered skill or
26
+ `clawdcursor.com/llms.txt`).
27
+
28
+ ## [1.5.4] - 2026-06-15 — install & distribution hardening
29
+
30
+ ### Changed
31
+
32
+ - **Installer is now `npm i -g`, not a git-clone-and-build.** The
33
+ `curl … | bash` / `irm … | iex` one-liners previously cloned the repo and ran
34
+ `npm install` + `npm run build` on the user's machine — requiring git and a
35
+ full build toolchain, and diverging from the `npm i -g clawdcursor` the README
36
+ advertises. They now install the published package globally. macOS still gets
37
+ a working native helper because the package's `postinstall` builds and
38
+ ad-hoc-signs it (ad-hoc is `build.sh`'s default). `VERSION=vX.Y.Z` still pins,
39
+ now via `clawdcursor@X.Y.Z`.
40
+ - **New Claude Code plugin** (`.claude-plugin/plugin.json`) registers the MCP
41
+ server in compact mode — launched via `npx -y clawdcursor` so there's **no
42
+ global install to do first** (npx fetches on demand, or uses a global install
43
+ if present), while still resolving the package `bin` so it survives entry-path
44
+ refactors — and bundles the root `SKILL.md`. A one-step, config-free install
45
+ for Claude Code. Manifest version auto-syncs via `scripts/sync-version.ts`
46
+ (and is guarded by the version-drift test).
47
+
48
+ ### Fixed
49
+
50
+ - **Back-compat entry point at `dist/index.js`.** v0.x shipped the CLI there;
51
+ v1.0 moved it to `dist/surface/cli.js`. Hosts that had hard-pinned
52
+ `node <pkg>/dist/index.js …` (e.g. a hand-written MCP entry in Claude Code's
53
+ `.claude.json`) silently broke on a routine `npm i -g clawdcursor` upgrade —
54
+ the MCP server just failed to start with no clear cause. A thin re-export
55
+ shim (`src/index.ts` → `dist/index.js`) now forwards to the real CLI, so those
56
+ pinned paths keep working across the move. New configs should still launch the
57
+ `clawdcursor` bin directly or use the Claude Code plugin, neither of which
58
+ pins a deep dist path.
59
+ - **`uninstall` no longer dead-ends.** It removes the global `clawdcursor`
60
+ command, so `clawdcursor install` can't follow it — and the old success
61
+ message only said how to delete *more*. Uninstall now prints the reinstall
62
+ one-liner (`npm i -g clawdcursor`, plus the OS turnkey installer), so there's
63
+ always an obvious way back.
64
+
65
+ ## [1.5.3] - 2026-06-14 — edge-glow indicator + security hardening
66
+
67
+ ### Added
68
+
69
+ - **Screen-edge "task in progress" glow.** A full-screen, click-through amber
70
+ glow pulses (dim ↔ bright) on all four screen edges whenever an agent is
71
+ driving the desktop — ambient, at-a-glance awareness that automation is live.
72
+ It rides the same lifecycle as the control-banner pill (shown together,
73
+ hidden together) and never steals focus or intercepts input: a per-pixel-alpha
74
+ layered window with `WS_EX_NOACTIVATE | WS_EX_TRANSPARENT`. Opt out of just the
75
+ glow with `CLAWD_NO_GLOW=1` — the pill (and its double-click-to-stop) stays.
76
+ Windows-only today, like the banner; the API is platform-neutral so
77
+ macOS/Linux overlays can land later. (`scripts/edge-glow.ps1`)
78
+
79
+ ### Security / hardening
80
+
81
+ - **Insecure temp files (CWE-377).** The `agent console` terminal scripts were
82
+ written to a predictable `tmpdir/clawdcursor-task-<time>.{ps1,sh}` path and
83
+ then executed; they now use a private `fs.mkdtemp()` directory. The macOS
84
+ screenshot temp moved from `Date.now()` to `crypto.randomUUID()`. A
85
+ source-invariant guard test keeps predictable temp-file names from returning.
86
+ - **Browser user-data dir** used a `/tmp` fallback that is wrong on Windows —
87
+ now `os.tmpdir()`. The unreachable pre-adapter launch fallback gained a
88
+ metacharacter guard so a crafted app name can't escape the PowerShell command.
89
+ - **Code-scanning sweep.** Closed the real CodeQL alerts and documented the
90
+ false positives (the snapshot fingerprint SHA-1 is a non-credential checksum;
91
+ an assertion `fs.open` is read-only). The transitive `file-type` advisory was
92
+ assessed unreachable (the vulnerable ASF path never runs) and dismissed.
93
+
94
+ ## [1.5.2] - 2026-06-13 — reliability, honest verification, transparency
95
+
96
+ The theme of this patch is **trust**: the cheap perception path works for
97
+ external agents again, a task can no longer claim success it can't back, and a
98
+ human at the machine always sees (and can stop) automation. Every fix came
99
+ from driving real apps live; all are regression-tested.
100
+
101
+ ### Fixed — perception over MCP (the big ones)
102
+
103
+ - **`read_screen` returned an empty tree for *every* app over MCP.** It didn't
104
+ default to the active window's pid, so the accessibility bridge built no
105
+ tree. It now resolves the foreground pid (parity with `find_element`) on
106
+ Windows, macOS, and Linux — the flagship cheap-perception path works again.
107
+ - **Every "element not found" stalled ~20 seconds.** The PowerShell bridge
108
+ emitted nothing for an empty result (the array unrolled to zero objects), so
109
+ the call timed out; a single match also unwrapped to a bare object and was
110
+ dropped. Both fixed — a miss now returns in well under a second.
111
+ - **`open_app` launched apps in the background**, so the next focused-window
112
+ action targeted the wrong window. It now brings the launched window to the
113
+ foreground.
114
+
115
+ ### Fixed — honest results (no false success)
116
+
117
+ - **Verification integrity.** A task that changed the screen can no longer be
118
+ marked `done` on evidence that was already true before it acted (an ambient
119
+ clock, an already-open window). New `file_changed_since_start` assertion
120
+ proves a file was actually written during the task.
121
+ - **`open_file` on a folder** no longer reports a bare "Opened" when Explorer
122
+ actually landed on Home — it verifies the folder window opened (and no
123
+ longer falls back to a Start-Menu search that types into the search box).
124
+ - **`open_uri` now opens `ms-settings:` and similar** COM-handler schemes via a
125
+ ShellExecute fallback (they have no launchable executable), instead of
126
+ failing with "no registered handler".
127
+
128
+ ### Changed — safety calibration
129
+
130
+ - **Key blocklist is now two-tier.** Genuinely dangerous combos
131
+ (Ctrl+Alt+Del, Win+L, force-quit, shutdown) stay hard-blocked; consequential
132
+ but legitimate ones (Win+D show-desktop, Ctrl+W close-tab, Alt+F4, Win+R…)
133
+ are now **confirm-tier** — usable with approval instead of dead-ended behind
134
+ a message that falsely promised a confirm path.
135
+ - **`minimize_window` no longer asks for confirmation** (tier 1, not 2) — it's
136
+ reversible, and the granular tool now matches the compound `window`
137
+ `{minimize}` surface that already allowed it.
138
+
139
+ ### Added — on-screen control banner (transparency)
140
+
141
+ - **"ClawdCursor — desktop control in progress" banner**: a topmost,
142
+ no-focus-steal pill at the top-center of the screen with a blinking red
143
+ recording dot, shown whenever an agent is actively driving the desktop —
144
+ pinned for the whole run of an autonomous task, and activity-triggered
145
+ (auto-hides after ~30s idle) for external agents driving over MCP
146
+ (stdio or HTTP). **Double-click it to stop**: runs the `clawdcursor stop`
147
+ flow (abort in-flight task → graceful shutdown). The human at the machine
148
+ always knows, and always has a kill switch. Windows today (macOS/Linux
149
+ adapters welcome — the controller is platform-neutral); disable with
150
+ `--no-banner` or `CLAWD_NO_BANNER=1`.
151
+
152
+ - Unmatched HTTP routes now return a JSON 404 with the endpoint list instead of
153
+ Express's default HTML error page.
154
+
155
+ ## [1.5.1] - 2026-06-12 — bulletproofing patch (live-session bugs)
156
+
157
+ Every fix in this patch came from a real failure observed while agents drove
158
+ real UIs — found in live runs, fixed at the root, regression-tested.
159
+
160
+ ### Fixed — safety
161
+
162
+ - **Coordinate clicks can no longer silently land on the wrong window.** When
163
+ Windows' foreground-lock defeats the pre-click activation (or the click
164
+ point is over a different window), `click`/`smart_click` now return a loud
165
+ **"⚠ FOCUS NOT CONFIRMED — DO NOT type next"** warning with the window that
166
+ was actually promoted, instead of a hollow success. The trigger was a real
167
+ keystroke leak: an OTP typed after a missed click went into a background
168
+ chat window.
169
+
170
+ ### Fixed — `task` delegation no longer times out MCP clients
171
+
172
+ - `task` / `delegate_to_agent` used to await the **whole** autonomous loop, so
173
+ any task longer than the client's per-call timeout (~60s) "timed out" while
174
+ the work finished invisibly. Now it waits up to `timeout` seconds (default
175
+ 45, clamped 1–50): finished → result as before; still running → a
176
+ `{status:"running"}` receipt with live progress while the loop continues.
177
+ Re-calling with the **same** task text re-attaches (never restarts); the
178
+ compact `task` tool gains `{action:"status"}` / `{action:"abort"}`.
179
+ A client-side timeout is **not** a task failure.
180
+
181
+ ### Fixed — perception honesty
182
+
183
+ - Window/element guards (`expect:{window:...}`) now normalize invisible
184
+ Unicode — Edge's title contains a no-break space in "Microsoft Edge" that
185
+ made correct guards fail.
186
+ - The a11y → CDP DOM fallback verifies the connected page actually corresponds
187
+ to the **focused** window before answering; it no longer reports another
188
+ browser's buttons as if they were on the focused page.
189
+
190
+ ### Fixed — ergonomics
191
+
192
+ - The agent's dedicated browser launches maximized (fresh profiles used to
193
+ open as a tiny window).
194
+ - `consent` / README / website / `doctor --help` now state the two-path
195
+ onboarding truth: MCP setup is `consent` + (macOS) `grant` — `doctor` is
196
+ only for the autonomous `agent` mode. macOS: Accessibility is required;
197
+ Screen Recording is optional (vision fallback only).
198
+
199
+ ## [1.5.0] - 2026-06-11 — UI State Compiler + reactive verification
200
+
201
+ The headline of this release is a new perception substrate and a verification
202
+ discipline that together let a cheap text model drive the desktop reliably,
203
+ without reaching for screenshots. No tools were renamed — existing editor
204
+ permission allowlists keep working; v1.5.0 only **adds** capability.
205
+
206
+ ### Added — the el_NN UI State Compiler
207
+
208
+ - **`compile_ui`** fuses the accessibility tree and OCR into ONE confidence-scored,
209
+ source-attributed UI map: every element gets a stable `el_NN` id, a role, a
210
+ name, coordinates, and capability flags. Act on an element symbolically via
211
+ `{element_id, snapshot_id}` — near-free in tokens, DPI-proof, and it survives
212
+ layout shifts.
213
+ - **Semantic finders** `find_action_button(intent)` / `find_input_field(purpose)`
214
+ locate a target by meaning (synonyms + geometric label association) and return
215
+ the `el_NN` to act on, escalating to OCR only when the a11y tree is sparse.
216
+ - These are reachable from BOTH the granular surface and the compact
217
+ `accessibility` compound (`action: "compile_ui" | "find_button" | "find_field"`).
218
+
219
+ ### Added — reactive step discipline (Layer C)
220
+
221
+ - Consequential actions (`invoke_element`, `set_field_value`, `type`, `key`,
222
+ `click`, `drag`, …) take an optional **`expect`** array of assertions. After the
223
+ action, clawdcursor verifies the stated outcome — polling for a short settle
224
+ window so asynchronous UIs (chip resolution, lazy title updates) aren't falsely
225
+ failed — and reports a **DEVIATION** when the UI didn't obey, instead of
226
+ reporting a hollow success. The agent adapts rather than building on a false
227
+ assumption.
228
+ - A new `move` (hover) action and a stepped `drag` `path` (curve tracing) round
229
+ out the canvas/gesture surface.
230
+
231
+ ### Fixed — agent-loop reliability (internal audit)
232
+
233
+ - The post-action UI map is no longer invalidated the instant it's advertised —
234
+ `el_NN` refs offered for the next turn now actually resolve.
235
+ - Ref freshness no longer races the LLM round-trip (TTL widened; event-driven
236
+ invalidation + the window guard are the real staleness signals).
237
+ - `batch` steps now get the FULL single-call pipeline: label resolution for the
238
+ safety gate, active-app refresh between steps, outcome-gated map invalidation,
239
+ and per-step `expect` verification.
240
+ - Coordinate-space default follows context (image-space only while a screenshot
241
+ is actually in context; it no longer latches on for the rest of a run).
242
+ - Every screen-derived tool output (a11y, OCR, page DOM, clipboard) is wrapped in
243
+ `<untrusted-screen-content>` delimiters — prompt-injection defense now covers
244
+ every perception path, not just two.
245
+
246
+ ### Fixed — external-agent (MCP) surface
247
+
248
+ - The `el_NN` substrate is now reachable over stdio MCP (a session UIMap holder
249
+ is constructed for the editor-hosted server, not only the daemon).
250
+ - The safety gate resolves `el_NN` refs to their element label over MCP too, so
251
+ destructive-label gating (Send/Delete/Pay) fires the same as in-loop; a
252
+ caller-supplied `expect` is honored on the MCP route.
253
+ - `cdp_connect` / `browser_connect` now disclose when they **attached to your
254
+ existing browser session** vs launched a dedicated agent-owned instance.
255
+ - `get_value` reads the editor's text via TextPattern (Windows) / non-empty
256
+ AXValue (macOS) when ValuePattern is empty — fixes false "value is blank"
257
+ reads on Win11 Notepad and the duplicate-write retries they caused.
258
+ - `read_clipboard` output is untrusted-wrapped; `close_window` warns it discards
259
+ all tabs/documents; dead `system` compound actions removed; `shortcuts_list`
260
+ drops platform-empty keys and de-duplicates.
261
+
262
+ ### Changed — security & browser ownership (post-RC hardening, same release)
263
+
264
+ - **Loopback-only bind is now enforced.** The daemon refuses to start when
265
+ `server.host` is a non-loopback address unless launched with
266
+ `--allow-remote` (which prints a loud warning). If you deliberately bind to
267
+ `0.0.0.0`/a LAN IP, add the flag; otherwise set the host back to `127.0.0.1`.
268
+ - **The agent's dedicated browser moved to its own CDP port** (`9333`, env
269
+ `CLAWD_AGENT_CDP_PORT`); port `9223` is now reserved for browsers *you* put
270
+ on the wire (`relaunch_with_cdp`, your own debug flags). Ownership is encoded
271
+ in the port, the dedicated instance's window is labeled
272
+ *"ClawdCursor — agent browser"*, and in attached mode navigation mechanically
273
+ opens the agent's **own tab** — your tabs are never navigated away.
274
+ - `mouse_triple_click` follows up with select-all when it lands in an edit
275
+ field, so typing after it replaces pre-filled text (Save As dialogs).
276
+ - Dependencies: commander 15, zod 4 (the MCP SDK peer-supports both), tsx
277
+ 4.22.4.
278
+ - CI: coverage ratchet thresholds + a production-path perf tripwire join the
279
+ existing npm-audit gate; the MCP SDK boundary is now explicitly typed.
280
+
281
+ ### Fixed — macOS parity (cross-platform audit)
282
+
283
+ - **el_NN now works on macOS.** The role map was Windows-UIA-only, so macOS AX
284
+ text fields and links resolved to "unknown" and the find/fill/link-click path
285
+ was effectively dead — added the AX role synonyms.
286
+ - **macOS password fields are redacted.** Secureness lives in the AX *subrole*
287
+ (`AXSecureTextField`); the helper now reads it and withholds the value, so a
288
+ secret never reaches the prompt or the fingerprint.
289
+ - The no-coordinate `scroll` center is computed in the driver's coordinate space
290
+ (logical points on Retina) instead of mislanding 2× off.
291
+ - macOS UI-tree traversal deepened to match Windows (depth 8), so `compile_ui`
292
+ sees real apps instead of a near-empty tree.
293
+ - README corrected: `clawdcursor grant` approves permissions; it does not build
294
+ the native helper.
295
+
296
+ ## [1.0.4] - 2026-06-07 — fix Windows minimize/resize (#153)
297
+
298
+ - **`window minimize` (and `window resize`) silently did nothing on Windows.**
299
+ Root cause: the PowerShell those commands run is built as a single concatenated
300
+ line and executed via `powershell.exe -Command <string>`, but it opened the
301
+ `Add-Type -MemberDefinition` block with a **here-string** (`@"…"@`). A here-string
302
+ header must be the last token on its line — on a single line PowerShell raises
303
+ *"No characters are allowed after a here-string header before the end of the line"*
304
+ and the **entire script fails to parse**, so the call produced no output and
305
+ returned `false`. Reported for UWP apps (Calculator/Settings) but it affected
306
+ every window. Switched to a single-quoted `-MemberDefinition '…'` (C# double-quotes
307
+ are literal inside it). Fixed in `setWindowState` (minimize/maximize/restore/close)
308
+ and `setWindowBounds` (resize); a static guard test prevents the here-string from
309
+ returning.
310
+ - Minimize now also drives the transition through the UIA `WindowPattern`
311
+ (`SetWindowVisualState`) with a title-first window lookup, the supported
312
+ cross-process path for UWP / ApplicationFrameHost-hosted windows whose Win32
313
+ `ShowWindow(SW_MINIMIZE)` no-ops; falls back to `ShowWindowAsync` for plain Win32.
314
+ Verified live on Calculator: minimize / restore / maximize / restore all succeed.
315
+
316
+ ## [1.0.3] - 2026-06-07 — fix macOS install/update loop (#155)
317
+
318
+ - **macOS updates were blocked after the first install.** `native/build.sh` writes
319
+ the helper into the git tree (`native/ClawdCursor.app/`, `native/.build/`), but
320
+ those weren't gitignored — so `install.sh`'s clean-tree guard saw a "dirty" tree
321
+ and refused every subsequent update. Now gitignored, and the generated
322
+ `native/ClawdCursor.app/Contents/Info.plist` (which made git descend into the
323
+ `.app` and surface the untracked binaries) is untracked — `build.sh` regenerates
324
+ it. The `.app` is built on-device and was never in the npm package.
325
+ - `clawdcursor uninstall` now also removes the native build artifacts.
326
+
327
+ ## [1.0.2] - 2026-06-07 — resilient uninstall
328
+
329
+ - **`clawdcursor uninstall` no longer crashes on Windows when a file is locked.**
330
+ A still-held handle on `~/.clawdcursor` (a running daemon, or the process's own
331
+ log file) raised `EPERM`, which escaped as an `unhandledRejection` and aborted
332
+ the uninstall half-done (config removed, global link + data dir left behind).
333
+ Each removal step now retries transient locks (`rmSync` maxRetries) and, on a
334
+ hard failure, warns + continues + lists the leftovers to delete manually —
335
+ instead of crashing the whole command.
336
+
337
+ ## [1.0.1] - 2026-06-06 — first npm publish + code-scanning cleanup
338
+
339
+ - First v1.x release published to the npm registry (`npm i -g clawdcursor`).
340
+ - Cleaned 4 CodeQL `js/unused-local-variable` notes (dead `shotToBlock` helper in
341
+ agent.ts, unused `beforeEach`/`invokeTool` in the characterization test, unused
342
+ `STEPS` const in scripts/measure-batch-tokens.ts). No behavior change.
343
+
344
+ ## [1.0.0] - 2026-06-06 — toolbox-first: pipeline removed, tools unified, thin agent loop
345
+
346
+ > **Breaking (major).** clawdcursor is now a desktop MCP **toolbox** for any agent, plus a thin *optional* autonomous loop. The autonomous morph pipeline (router → blind/hybrid/vision, decompose, verify, reflector) is gone — a capable model is its own pipeline. The `task` tool still hands a whole task to a cheaper configured model that "takes the wheel"; 4 pipeline-introspection tools were removed (catalog 98 → 94).
347
+
348
+ ### macOS
349
+
350
+ - **#154 (HiDPI/Retina mouse):** clicks/drags/moves no longer land ~2× off-target — mouse coords now map image-space → **logical** points on macOS (nut-js drives in logical points), physical on Windows/Linux. *(Correct by construction; needs real-Mac verification.)*
351
+ - **#150 / #151:** native helper bundle is signable (Info.plist generated, comment-free entitlements) and the mac/linux runtime scripts ship in the package. *(Confirmed on a real Mac, macOS 26.)*
352
+ - **#149:** screenshot helper inherits the daemon's Screen-Recording grant — ad-hoc signing no longer uses hardened runtime. *(Pending real-Mac re-verification.)*
353
+ - `window focus` by `processId` / `processName` now works on macOS (the JXA flag names were wrong).
354
+
355
+ ### Perception — cheap-first guidance made explicit
356
+
357
+ The MCP connect-time instructions and tool descriptions now spell out the escalation: read the accessibility tree first → OCR when the tree is empty/sparse → screenshot only as a last resort; prefer named-target actions over pixel coordinates. Every tool also carries a `[act] < [inspect] < [perceive-text] < [perceive-image]` cost-class prefix.
358
+
359
+ ### Removed — autonomous pipeline cluster (~13,000 LOC)
360
+
361
+ The router → blind/hybrid/vision morph ladder, preprocessor, decomposer, classifier,
362
+ verifier (ground-truth signals), Reflector, and knowledge/guide loader have all been
363
+ deleted. The file surface removed:
364
+
365
+ - `src/core/pipeline.ts`, `src/core/verifier.ts`, `src/core/compound.ts`,
366
+ `src/core/palettes.ts`, `src/core/handoff.ts`, `src/core/desktop-survey.ts`
367
+ - `src/core/classify/` (full directory)
368
+ - `src/core/decompose/` (full directory)
369
+ - `src/core/skills/` (full directory)
370
+ - `src/core/router/` (full directory)
371
+ - `src/core/knowledge/` (full directory)
372
+
373
+ Four granular tools removed alongside the pipeline:
374
+ `classify_task`, `detect_app`, `get_app_guide`, `learn_app`.
375
+
376
+ The `clawdcursor guides` CLI command is removed.
377
+
378
+ ### Changed — thin agent loop replaces the morph ladder
379
+
380
+ `agent.ts` is rewired to a single `runAgent` loop: the configured model perceives the
381
+ desktop (a11y → OCR → screenshot as needed), selects tools, and iterates until done or
382
+ the turn budget is exhausted. No rung selection, no mode flags, no rung escalation.
383
+ `AgentInput` is simplified: `task / maxTurns / isAborted / targetWindow` only.
384
+
385
+ `buildUnifiedTools()` and `buildSystemPrompt()` no longer accept a mode or capability
386
+ argument — they return the full unified toolbox.
387
+
388
+ ### Changed — MCP tool count
389
+
390
+ Granular catalog drops from 98 to **94 tools** (the four pipeline-only tools removed).
391
+ Compact surface: `computer` · `accessibility` · `window` · `system` · `browser` · `task` · `batch` = **7 entries**.
392
+
393
+ ### Changed — `task` delegation
394
+
395
+ `submit_task` → `agent.executeTask` → `_executeTask` → `runAgent`. The thin loop is the
396
+ configured model self-driving the toolbox. Framing: an expensive external agent can
397
+ delegate grunt work to clawdcursor's cheaper configured model, which takes the wheel.
398
+
399
+ ### Added — `batch` tool
400
+
401
+ New `batch` tool collapses N tool calls into one round-trip (declarative, guarded,
402
+ safety-gated per step). Each step is `{ name, arguments, expect? }`; optional `expect`
403
+ re-perceives before the step and halts on mismatch. On any guard miss, safety stop, or
404
+ error the batch halts and returns a per-step trace. `dryRun:true` pre-scans safety tiers
405
+ without executing. The efficiency lever for a driving agent: N calls → 1.
406
+
407
+ ---
408
+
409
+ ### Tool-unification migration (also part of 1.0.0)
410
+
411
+ ### Changed — one tool implementation, used everywhere
412
+
413
+ The MCP tool surface and the internal autonomous agent-loop used to carry **two
414
+ parallel implementations** of ~35 of the same tools (~2,100 LOC of duplication).
415
+ The MCP surface now **projects from the agent-loop (System B) implementations** via
416
+ `projectToToolDefinition`, so external agents inherit the reliability tweaks that
417
+ were previously internal-only: smushed-coordinate coercion, focus-theft
418
+ detection/reporting, automatic pid-scoping for a11y searches, the clipboard
419
+ paste fast-path, and conditional coordinate scaling.
420
+
421
+ - ~34 tools migrated (window, keyboard, mouse, a11y/perception, CDP). **Tool names
422
+ are unchanged — no renames** (the MCP catalog stays at 98 tools), so existing
423
+ editor/agent permission allowlists keep working. Parameters are backward-compatible
424
+ with one exception: `mouse_drag` drops the `x1/y1/x2/y2` convenience aliases (use the
425
+ canonical `startX/startY/endX/endY`, which are unchanged).
426
+ - Tools where System A is richer or unique are **kept on System A**: `ocr_read_screen`
427
+ (structured `elements[]`+bounds output), `smart_*`, `find_element`,
428
+ `navigate_browser` (the browser *launcher*), `cdp_evaluate/select/wait/tabs/scroll`,
429
+ and the extra mouse variants.
430
+ - A shared characterization test-suite pins the System B behaviors so the projection
431
+ can't silently regress them.
432
+ - (Pending) deletion of the now-dead System A handler bodies — the LOC drop lands
433
+ in a follow-up; this release makes System B the single source of truth.
434
+
435
+ ### Fixed
436
+
437
+ - **Packaging (#151):** the published package now ships the macOS (`scripts/mac/`)
438
+ and Linux (`scripts/linux/`) runtime scripts. Previously only Windows `.ps1` files
439
+ were whitelisted, so accessibility/window/OCR tools were dead on mac/Linux installs
440
+ — the same class of bug as the earlier Windows-bridge omission.
441
+ - **macOS native helper (#150):** `native/build.sh` now generates `Contents/Info.plist`
442
+ (without it the `.app` is an invalid, unsignable bundle) and `entitlements.plist` no
443
+ longer contains XML comments that `codesign`'s AMFI parser rejects. Unblocks the
444
+ signed-bundle path that TCC (Accessibility / Screen Recording) and #149 depend on.
445
+ (Final macOS sign/run verification is tracked in #150 / #149.)
446
+ - **Compact-surface friction:** native-name aliases stop the MCP validator from
447
+ silently dropping a correctly-intended arg; a central required-arg guard converts the
448
+ crash-on-undefined class into actionable errors; `open_app`/`open_file`/`open_url` are
449
+ reachable from the `system` compound (not just `window`); an unknown action now names
450
+ the compound that owns it; `key_press` accepts space-separated key sequences.
451
+ - **a11y consistency:** `smart_click` / `smart_type` / `smart_read` accept `name` as an
452
+ alias for `target` (the rest of the accessibility surface uses `name`).
453
+ - Confirm-tier safety and `task`-unavailable error messages are now actionable.
454
+
455
+ ### Behavior changes (v2)
456
+
457
+ ### Migration notes (v2 behavior change)
458
+
459
+ **`mouse_click` / `mouse_drag` / `mouse_scroll` — `space:'screen'` no longer double-scales**
460
+
461
+ External MCP callers that omit the `space` parameter are **unaffected** — omitting `space` continues to default to `'image'`, which applies the same image→physical scaling that all previous releases applied.
462
+
463
+ The one behavior change is for callers that explicitly pass `space:'screen'`:
464
+
465
+ | Caller behavior | v1.x result | v2 result |
466
+ |---|---|---|
467
+ | `{x, y}` (no `space`) | scaled (image→physical) | scaled (image→physical) — **unchanged** |
468
+ | `{x, y, space:'image'}` | scaled (image→physical) | scaled (image→physical) — **unchanged** |
469
+ | `{x, y, space:'screen'}` | **double-scaled** (bug) | pass-through — **fixed** |
470
+
471
+ If your agent passes a11y-snapshot coordinates via `mouse_click` / `mouse_drag` / `mouse_scroll` and previously compensated by dividing by the DPI ratio before sending, remove that compensation after upgrading.
472
+
473
+ ### Implementation notes
474
+
475
+ - `mouse_click`, `mouse_drag`, `mouse_scroll`, `mouse_move_relative`, `mouse_down`, `mouse_up` are now projected from System B (`buildUnifiedTools`) via `projectToToolDefinition` (the same uniform path used by the window and keyboard groups in Steps 3–4).
476
+ - The projected coord-sensitive tools (`click`, `drag`, `scroll`) inject `space:'image'` as the default when the caller omits it, preserving the legacy scaling contract.
477
+ - System A handlers for these six tools are intentionally kept (Step 8 handles removal).
478
+ - Tools left on System A (no System B granular equivalent): `mouse_hover`, `mouse_double_click`, `mouse_right_click`, `mouse_middle_click`, `mouse_triple_click`, `mouse_scroll_horizontal`, `mouse_drag_stepped`.
479
+ - **`mouse_drag`**: the `x1/y1/x2/y2` convenience aliases are removed; use the canonical `startX/startY/endX/endY` (unchanged, still required). Callers already using the canonical names are unaffected.
480
+
481
+ **`mouse_scroll` — `x` and `y` are no longer required**
482
+
483
+ System A required `x`, `y`, and `direction`. In v2 only `direction` is required; omitting `x`/`y` scrolls at the screen center (safe default). Callers that always supply `x`/`y` are unaffected.
484
+
485
+ | Caller behavior | v1.x result | v2 result |
486
+ |---|---|---|
487
+ | `{x, y, direction}` | scrolls at (x,y) | scrolls at (x,y) — **unchanged** |
488
+ | `{direction}` (no x/y) | schema validation error (x/y required) | scrolls at screen center |
489
+
490
+ **`key_press` — `key` param removed from JSON-Schema `required` array**
491
+
492
+ System A's JSON schema listed `key` as required. In v2 the schema lists neither `combo` nor `key` as required (the execute body still guards the total absence and returns an actionable error). Callers supplying the `key` param are fully unaffected; the only change is that MCP-level schema validation no longer rejects a missing-key call before it reaches the handler.
493
+
494
+ | Caller behavior | v1.x result | v2 result |
495
+ |---|---|---|
496
+ | `{key: "Return"}` | runs normally | runs normally — **unchanged** |
497
+ | `{}` (no key) | schema validation error | handler-level error (actionable message) |
498
+
499
+ **`set_field_value` — category corrected from `'window'` to `'perception'`**
500
+
501
+ TOOL_META had `set_field_value` category as `'window'`; System A's `a11y_depth.ts` definition uses `'perception'`. The mismatch is corrected: the projected tool now reports `category: 'perception'`, matching the System A original. This is a routing/metadata fix with no behavioral change.
502
+
503
+ **`invoke_element` — `automationId` matching now falls back to name-based search**
504
+
505
+ The `automationId` parameter is accepted for backward-compat but the `PlatformAdapter.invokeElement` interface does not expose automationId filtering. When a caller passes only `automationId` (no `name`), the value is used as the `name` search string, which is a best-effort fallback.
506
+
507
+ | Caller behavior | v1.x result | v2 result |
508
+ |---|---|---|
509
+ | `{name: "OK"}` | name-based a11y match | same — **unchanged** |
510
+ | `{automationId: "btn_ok"}` | exact automationId match | uses `automationId` as name string (best-effort) |
511
+ | `{name: "OK", automationId: "btn_ok"}` | name + automationId match | name is used; automationId is accepted but not narrowing |
512
+
513
+ For precise automationId targeting, prefer `find_element` (which filters by automationId) followed by `invoke_element` with the found element's `name`.
514
+
515
+ **`cdp_connect` — now auto-launches a browser when none is running**
516
+
517
+ Previously `cdp_connect` only attached to an already-running Chrome/Edge process.
518
+ In v2 it auto-launches Edge/Chrome with the CDP debug port if no browser is connected.
519
+
520
+ | Caller behavior | v1.x result | v2 result |
521
+ |---|---|---|
522
+ | No browser running | error "Failed to connect…" | launches Edge/Chrome, then connects |
523
+ | Browser already running | attaches | attaches — **unchanged** |
524
+
525
+ If you previously launched the browser manually (via `navigate_browser`) before calling `cdp_connect`, that workflow continues to work. The new behavior is additive.
526
+
527
+ **`cdp_page_context` — gains an optional `selector` param**
528
+
529
+ Previously `cdp_page_context` took no parameters and always returned the full structured
530
+ interactive-element list for the page.
531
+ In v2 callers may pass an optional CSS `selector`; when present, the tool returns the
532
+ plain-text content of the matching element instead of the full element list.
533
+
534
+ | Caller behavior | v1.x result | v2 result |
535
+ |---|---|---|
536
+ | No params | structured interactive-element list | same — **unchanged** |
537
+ | `{selector: "main"}` | invalid param (ignored or error) | text content of `main` element |
538
+
539
+ Callers that pass no params are fully unaffected. The no-param path returns the same
540
+ `getPageContext()` result as before.
541
+
542
+ ### Implementation notes (Step 7 — CDP / browser group)
543
+
544
+ - `cdp_connect`, `cdp_page_context`, `cdp_click`, `cdp_type` are now projected from System B
545
+ (`buildUnifiedTools`) via `projectToToolDefinition` (the same uniform path used by Steps 3–6).
546
+ - System A handlers for these four tools are intentionally kept (Step 8 handles removal).
547
+ - **`navigate_browser` is NOT migrated.** System A's `navigate_browser` is a browser-launcher
548
+ tool (`safetyTier 2`, `category: 'orchestration'`) that spawns Edge/Chrome with
549
+ `--remote-debugging-port`. System B's `browser_navigate` is a within-session navigation call
550
+ that requires a prior `browser_connect`. Projecting `browser_navigate` as `navigate_browser`
551
+ would silently strip the launch capability and break external callers.
552
+ - Tools left on System A (no System B equivalent in `buildUnifiedTools()`):
553
+ `navigate_browser`, `cdp_read_text`, `cdp_select_option`, `cdp_evaluate`,
554
+ `cdp_wait_for_selector`, `cdp_list_tabs`, `cdp_switch_tab`, `cdp_scroll`.
555
+
556
+ ---
557
+
558
+ ## [1.0.0-autonomous] - 2026-06-03 — adaptive pipeline variant (superseded by the toolbox 1.0.0; preserved on branch `v1.0.0-autonomous`)
559
+
560
+ ### Upgrading from 0.9.x
561
+
562
+ **MCP server id.** The server id has been `clawdcursor` since v0.9.0 (it
563
+ was `clawd-cursor` before that). If your editor re-prompts for every tool
564
+ call after upgrading, your allowlist entries are keyed to the old id or to
565
+ individual tool names. Switch to the **server-level wildcard**:
566
+
567
+ ```
568
+ mcp__clawdcursor
569
+ ```
570
+
571
+ A single wildcard entry covers all current and future tools and survives
572
+ tool renames across versions — per-tool entries like
573
+ `mcp__clawdcursor__window` silently break whenever a tool is added,
574
+ removed, or renamed.
575
+
576
+ ### Added — text ↔ vision handoff in the adaptive pipeline
577
+
578
+ The pipeline now switches between text-only and vision rungs mid-task
579
+ when the verifier signals a mismatch, rather than restarting. Spatial
580
+ gestures (drag into / onto) correctly morph to the vision rung instead of
581
+ staying blind.
582
+
583
+ ### Added — cost-class metadata on all 97 granular tools
584
+
585
+ Every granular tool is stamped with a `costClass` (`act` / `inspect` /
586
+ `perceive-text` / `perceive-image`). The class is exposed in the MCP
587
+ `tools/list` description prefix so external agents can select the
588
+ cheapest viable tool without reading the full schema.
589
+
590
+ ### Added — desktop-survey grounding for the preprocessor
591
+
592
+ The preprocessor and decomposer now plan from live desktop perception
593
+ (open windows + OS-default handlers) instead of static app guesses.
594
+ The stay-in-target-window guardrail refuses actions against windows that
595
+ were not open when the task started.
596
+
597
+ ### Added — intent-driven email compose-send
598
+
599
+ `compose-send` only auto-fires the Send action when the task description
600
+ explicitly requests sending. Tasks that ask to draft or compose leave a
601
+ pre-filled draft open instead of dispatching immediately.
602
+
603
+ ### Added — CDP/DOM browser rung for the autonomous agent
604
+
605
+ For web tasks the autonomous agent can drive a dedicated, agent-owned
606
+ browser through the DOM (CSS selectors / visible text, no pixels) instead
607
+ of OCR-on-the-desktop plus coordinate clicks. The instance is launched with
608
+ its own profile so it never closes, reuses, or steals focus from the user's
609
+ own browser windows. Degrades gracefully to OCR (`read_text` / `smart_click`)
610
+ when CDP isn't available.
611
+
612
+ ### Added — OCR perception on the cheap text rung
613
+
614
+ `read_text` and `smart_click` let the text model read and click webview /
615
+ canvas content via OCR — no escalation to the vision model.
616
+
617
+ ### Fixed — npm package shipped without the Windows bridge + OCR scripts (critical)
618
+
619
+ `scripts/ps-bridge.ps1` (the persistent UIA bridge) and `scripts/ocr-recognize.ps1`
620
+ were never in the package.json `files` whitelist, so a real `npm install` shipped
621
+ without them. On Windows the bridge crashed on every spawn in an infinite restart
622
+ loop, leaving the whole desktop-perception layer dead — `list_windows` returned 0,
623
+ the accessibility tree was empty, OCR failed — so the agent could launch apps but
624
+ was blind. This affected every published install (0.9.7–0.9.9); it was masked in
625
+ development by `npm link`. Now `scripts/*.ps1` ships in the package.
626
+
627
+ ### Added — Windows panic-stop hotkey
628
+
629
+ `scripts/install-panic-hotkey.ps1` installs a global keyboard shortcut
630
+ (default Ctrl+Alt+K) that force-kills every clawdcursor process — the daemon and
631
+ its PowerShell UIA/OCR children — instantly, for when an autonomous run misbehaves.
632
+
633
+ ### Fixed — Save As filename field on Windows
634
+
635
+ The granular `set_field_value` → `invoke-element set-value` path in
636
+ `ps-bridge.ps1` lacked the composite handling added to the compound
637
+ `set_value` path in v0.9.7. The "File name:" label is a read-only Text
638
+ element; the fix resolves the writable sibling Edit control via
639
+ `LabeledBy` before writing, with a keyboard-sequence fallback.
640
+
641
+ ### Fixed — CLI flags honoured in non-interactive mode
642
+
643
+ `--provider` and `--model` flags passed to `clawdcursor agent` were
644
+ silently ignored when no TTY was attached. The config-reading path now
645
+ applies CLI flags before falling back to the config file on all entry
646
+ points.
647
+
648
+ ### Fixed — keyboard / typing / open_app could hang over MCP (tools-only)
649
+
650
+ Over `clawdcursor mcp` (stdio) and `agent --no-llm` (HTTP), `key_press`,
651
+ `type_text`, and `open_app` could hang indefinitely. Root cause: a latent
652
+ zombie-promise in the persistent PowerShell/UIA bridge runner — when the
653
+ bridge exited before signalling ready, the startup promise was never
654
+ settled, so any awaiter hung forever. The bridge now rejects and recovers,
655
+ and the cosmetic active-window lookup in `key_press`/`type_text` is
656
+ time-boxed so a slow or recovering bridge can never block a keystroke. The
657
+ full LLM agent path was unaffected.
658
+
659
+ ### Changed — retired hardcoded in-app choreography constants
660
+
661
+ Per-app tab-order and keystroke constants (e.g. `tabsAfterRecipient`) are
662
+ removed; the pipeline derives sequencing from live accessibility-tree
663
+ inspection instead.
664
+
665
+ ## [0.9.9] - 2026-05-24 — security hardening + registry perf
666
+
667
+ ### Security — AppleScript backslash escaping + crypto host token (PR #136)
668
+
669
+ From a full triage of the open CodeQL alerts (only 2 were genuine; the
670
+ other 20 were by-design for a local single-user tool and were dismissed
671
+ with justifications):
672
+
673
+ - **AppleScript injection (CodeQL #61–64, HIGH).**
674
+ `buildMacWindowTargetClause` escaped `"` but not `\` before embedding
675
+ `processName`/`title` into an `osascript -e` double-quoted string. `\` is
676
+ an AppleScript escape character and these fields are LLM/screen-supplied,
677
+ so a value containing a backslash could break out of the string literal.
678
+ Now escapes `\` then `"` at all four sites (macOS-only path).
679
+ - **Host-helper token (CodeQL #77, HIGH).** Replaced `Math.random()` (not
680
+ cryptographically secure) with `crypto.randomBytes(24)`, and the
681
+ check-then-write with an exclusive create (`flag: 'wx'`) that reads the
682
+ existing token on `EEXIST` — closing a TOCTOU window.
683
+
684
+ ### Performance — memoize the granular tool registry (PR #116)
685
+
686
+ `getTool(name)` resolved via `getAllTools().find(...)` and `getTools()`
687
+ re-spread all 14 `get*Tools()` sources on every call, so every single-tool
688
+ lookup (the dispatch hot path) rebuilt the entire registry. The granular
689
+ definitions are static, so they're now assembled once and cached;
690
+ `getTools()`/`getAllTools()` still return fresh copies (mutation-safe), and
691
+ `getTool()` searches the cache directly. No behavior change.
692
+
693
+ ## [0.9.8] - 2026-05-24 — complete the Toolbox + registry metadata + site refresh
694
+
695
+ ### Added — smart_* and URI escape hatches reach the compound Toolbox (PR #135)
696
+
697
+ Three useful granular tools were orphaned from the recommended 6-tool
698
+ compound surface; they're now wired in (cross-OS — each underlying tool was
699
+ already cross-platform, this only changes dispatch):
700
+
701
+ - **`accessibility`** gains `smart_click` / `smart_type` / `smart_read` —
702
+ auto-fallback OCR → a11y → CDP by element text, no coordinates.
703
+ - **`system`** gains `open_uri` / `build_uri` / `learn_app` — the URI escape
704
+ hatches (`mailto:` `tel:` `slack:` `vscode:` `spotify:` `file:` …) that
705
+ accomplish an intent without driving UI, plus a guide-write companion to
706
+ `app_guide`. `open_uri` dispatches via macOS `open`, Linux `xdg-open`, and
707
+ Windows registered-handler resolution.
708
+
709
+ Safety: `safety.ts` gains matching `publicCompoundMap` + `TOOL_TIER` entries
710
+ so the new actions gate correctly on the compound path (`open_uri` /
711
+ `learn_app` → destructive, `build_uri` → read), not the `input` default.
712
+
713
+ ### Changed — npm registry metadata (PR #132)
714
+
715
+ Added `mcpName: io.github.AmrDab/clawdcursor` (for the official MCP
716
+ registry), refreshed the stale package description to the current
717
+ local-MCP-server / fallback-layer positioning, and added `mcp-server` /
718
+ `gui-automation` keywords.
719
+
720
+ ### Changed — website refresh (PR #134)
721
+
722
+ Hero headline restored to "A cursor and a keyboard for any AI agent";
723
+ install section rebuilt as a segmented tab bar (`npm` · Windows · macOS/Linux
724
+ · Source) with npm a first-class option; tool-surface labels aligned to the
725
+ README's Toolbox / Tools naming.
726
+
727
+ ### Fixed — CI: mcp-orphan-teardown flake on Windows (PR #133)
728
+
729
+ The test is no longer skipped on Windows (the platform the orphan bug it
730
+ guards lived on). It runs with a 20s exit budget instead of 5s — tolerating
731
+ slow native-module teardown on windows runners while still catching a
732
+ genuine hang. The earlier Node-20-only skip wrongly assumed Node 22 was
733
+ immune.
734
+
735
+ ## [0.9.7] - 2026-05-23 — GUI reliability + safety/efficiency tuning + npm install
736
+
737
+ First release published to **npm** — `npm i -g clawdcursor` now works on
738
+ any OS. Bundles the fixes that landed on `main` after v0.9.6.
739
+
740
+ ### Fixed — Save As dialog reliability on Windows (PR #128, #122 + #123)
741
+
742
+ - **`set_field_value` on a ComboBox+Edit composite** (e.g. the Save As
743
+ filename field) returned `set_field_value failed for 'undefined'`. Fixed
744
+ with a PS-level inner-Edit-child retry plus a TS keyboard fallback that
745
+ targets the widest-bounds element sharing the name (the input, not the
746
+ label) when ValuePattern is absent (Win11 XAML dialogs).
747
+ - **Clicks could land on a background window** when a dialog sat over
748
+ another window (focus/DPI race). `WindowsAdapter.mouseClick` now calls
749
+ `ensureForegroundAtPoint(x, y)` first — `WindowFromPoint` →
750
+ `GetAncestor(GA_ROOT)`, a no-op fast path when already foreground, else
751
+ the `AttachThreadInput` + `SetForegroundWindow` dance to beat the
752
+ Windows foreground lock.
753
+ - #121 (triple_click in Save As) was reviewed and intentionally **not**
754
+ changed: `mouse_triple_click` is documented as "selects a paragraph",
755
+ so rerouting it to Ctrl+A globally would break that contract elsewhere.
756
+
757
+ ### Fixed — safety gate no longer flags typed prose (PR #127, #124)
758
+
759
+ The destructive-label patterns (`\bsend\b`, `\bconfirm\b`, …) are meant
760
+ for the label of a control being *activated* (clicked/invoked), but the
761
+ MCP gate also matched them against the `text` payload of typing tools.
762
+ Typing "…verification to confirm reliable automation" tripped a confirm
763
+ gate. Fixed by skipping the patterns for typing canonical tools
764
+ (`type_text`, `cdp_type`) via a `TYPING_TOOLS` denylist — click/invoke
765
+ label safety (incl. `cdp_click` by visible text) is fully preserved.
766
+
767
+ ### Added — explicit token-cost hierarchy in the agent prompt (PR #129)
768
+
769
+ `buildSystemPrompt` (also served to external agents via
770
+ `get_system_prompt`) now states the cost ladder so any agent climbs
771
+ cheap→expensive deliberately: act (click/type/key) < inspect
772
+ (find_element/get_element) < read a11y tree / OCR (read_screen) <
773
+ screenshot. Reinforces "read the attached a11y snapshot before spending
774
+ a screenshot."
775
+
776
+ ### Security — qs DoS bump (PR #126)
777
+
778
+ `qs` 6.14.2 → 6.15.2 (transitive via express/supertest) — patches a
779
+ remotely-triggerable `qs.stringify` DoS.
780
+
781
+ ### Added — npm install + website/README npm one-liner
782
+
783
+ `clawdcursor` is now published to npm. README Quickstart and the website
784
+ Install section lead with `npm i -g clawdcursor` (with the macOS
785
+ native-helper note); the OS installer scripts remain for the
786
+ clone-build-link path that handles the macOS native build automatically.
787
+
788
+ ## [0.9.6] - 2026-05-22 — key_press crash fix + auth-hardening + docs catchup + CI stabilization
789
+
790
+ ### Fixed — `key_press` crashed on non-printable keys (PR #125, fixes #120)
791
+
792
+ A live test driving the compact MCP surface end-to-end (Outlook email +
793
+ Paint drawing, tools only) surfaced that `computer.key` /
794
+ `key_press` threw `Cannot read properties of undefined (reading
795
+ 'toLowerCase')` on `Backspace`, `Enter`, `Tab`, `Delete`, and `Ctrl+*`
796
+ combos. Root cause: `normalizeKey()` in `src/platform/keys.ts`
797
+ called `.toLowerCase()` on its argument without guarding against
798
+ non-string / empty input, so any code path that reached it with an
799
+ unexpected value crashed instead of degrading gracefully.
800
+
801
+ `normalizeKey()` now validates its input and throws a clear,
802
+ debuggable error (`expected a non-empty string`) instead of a cryptic
803
+ `TypeError`; `native-desktop.ts` guards the parsed-key path the same
804
+ way. The fix sits on the shared `NativeDesktop` path that
805
+ `computer.key` traverses on **all three platforms** (Windows, macOS,
806
+ Linux). Test coverage: 9 cases at
807
+ `src/__tests__/keys-normalization.test.ts` covering valid combos plus
808
+ empty/undefined/non-string inputs. Thanks to first-time contributor
809
+ @xxiaoxiong.
810
+
811
+ ### Docs — `Toolbox` / `Tools` naming + restored action-enum tables (PR #111)
812
+
813
+ The repositioning in #93 inadvertently stripped the per-toolbox action
814
+ enum tables that v0.9.3 shipped. Readers landing on the post-v0.9.4
815
+ README saw vague descriptions like *"computer — Mouse, keyboard,
816
+ screenshot. Raw I/O."* with no way to discover the ~70 verbs each
817
+ compound tool actually exposes short of querying `tools/list`. The
818
+ tables are restored verbatim from v0.9.3, and the two sections are
819
+ labeled **`Toolbox` — 6 compound tools (recommended)** and **`Tools`
820
+ — 97 granular primitives** to make the catalog choice unambiguous.
821
+
822
+ ### Security — dashboard cookie auth instead of inline-JS token injection
823
+
824
+ The dashboard at `/` no longer injects the bearer token into client
825
+ JS. The previous flow set `var __TOKEN = '__CLAWD_TOKEN_PLACEHOLDER__'`
826
+ in the served HTML so dashboard JS could send `Authorization: Bearer`
827
+ on `/mcp` calls — which meant any future XSS, a malicious browser
828
+ extension, or a host misbind to a non-loopback address could exfiltrate
829
+ the live token and execute the full MCP tool catalog.
830
+
831
+ The server now sets `clawdcursor_token` as a `httpOnly` + `sameSite:
832
+ strict` cookie when serving `/`. Dashboard JS no longer carries the
833
+ token at all; `fetch('/mcp', …)` relies on the browser auto-attaching
834
+ the cookie on same-origin requests. The auth gate at
835
+ `src/surface/http-utility.ts` accepts both `Authorization: Bearer`
836
+ headers (used by external tooling) and the cookie (used by the
837
+ dashboard) — backward-compatible for any script that authenticates by
838
+ header.
839
+
840
+ ### Security — `requireAuth` no longer silently accepts on-disk token rotation by default
841
+
842
+ `requireAuth` previously fell back to reading `~/.clawdcursor/token`
843
+ when the incoming token didn't match the in-memory token. That allowed
844
+ any process with write access to that file to rotate the auth token
845
+ and gain MCP access immediately without restarting the daemon.
846
+
847
+ Drift acceptance is now opt-in via `CLAWD_ALLOW_DISK_TOKEN_DRIFT=1`.
848
+ The default is fail-closed: a request whose token doesn't match the
849
+ in-memory token is rejected, regardless of what's on disk.
850
+
851
+ **Backward-incompatible** for any tooling that rotated the disk token
852
+ to authenticate against a running daemon. Set
853
+ `CLAWD_ALLOW_DISK_TOKEN_DRIFT=1` to restore the previous behavior.
854
+
855
+ ### CI — global nut-js mock for Linux runners
856
+
857
+ `tests/vitest.setup.ts` wires a global mock for `@nut-tree-fork/nut-js`
858
+ so vitest can boot on Linux CI runners that don't have libXtst /
859
+ libxdo installed. Existing per-file `vi.mock('@nut-tree-fork/nut-js',
860
+ …)` declarations continue to override the global, so no existing
861
+ test behavior changes. Method names in the global mock match
862
+ production usage in `src/platform/native-desktop.ts` (`mouse.click`,
863
+ `screen.grabRegion`, etc.) so the global is a usable fallback for
864
+ new tests.
865
+
866
+ ### CI — skip `mcp-orphan-teardown` on Windows + Node 20.x (PR #118)
867
+
868
+ `tests/mcp-orphan-teardown.test.ts` failed intermittently on the
869
+ `windows-latest / Node 20.x` matrix slot — always with `process did
870
+ not exit within 5000ms`, always passing on rerun. Same failure family
871
+ as the existing headless-Linux skip: `clawdcursor mcp` loads heavy
872
+ native modules (nut-js, sharp's libvips, playwright) whose teardown
873
+ doesn't finish within the 5s exit budget on Node 20 specifically
874
+ (Node 22.x tightened process-exit semantics, so the contract holds
875
+ there). The test now skips on Windows + Node 20.x, preserving coverage
876
+ on macOS, Linux-with-display, and Windows + Node 22.x.
877
+
878
+
879
+ ## [0.9.5] - 2026-05-21 — repositioning + compact `task` fix + macOS Tahoe silent screenshots + npm publish prep
880
+
881
+ Three threads landed: a documentation reframe so the README finally
882
+ matches what the product actually is, a real ship-bug fix for one of
883
+ the six headline compact tools, and a macOS 26 Tahoe compatibility
884
+ fix. Also: package metadata is now npm-publish-ready.
885
+
886
+ ### Added — README + homepage repositioning (PR #93)
887
+
888
+ After v0.9.4's live tests confirmed external LLMs (Sonnet driving the
889
+ compact MCP surface) consistently passed real tasks via the MCP
890
+ catalog, the documentation now leads with that fact instead of the
891
+ "skill, not an app" framing.
892
+
893
+ - Old tagline: *"A cursor and a keyboard for any AI agent on a real desktop."*
894
+ - New tagline: **"The local MCP server that gives any agent safe desktop control."**
895
+
896
+ Above-the-fold opening triplet now names the three defensible
897
+ architectural claims: **no cloud / no telemetry by default**, **single
898
+ `safety.evaluate()` chokepoint** every tool call routes through, and
899
+ **bearer-token auth on every HTTP request**. Homepage (docs/index.html)
900
+ mirrors the README changes.
901
+
902
+ ### Fixed — compact `task` compound returns `success: false` on success (PR #110)
903
+
904
+ The compact `task` action — one of the six headline tools — routes
905
+ through `delegate_to_agent`, which polls `agent_status` until idle
906
+ and then reads `data.lastResult` to report `{success, verified, steps,
907
+ lastAction}` to the caller.
908
+
909
+ But `AgentState` had no `lastResult` field (`src/types.ts:80`). After
910
+ `executeTask()` finished, the result was returned to the direct caller
911
+ but never written onto state. The poll-then-read path saw `undefined`
912
+ and reported `{success: false, steps: 0}` on every completed task —
913
+ including the successful ones. One of the six headline tools was
914
+ silently broken in v0.9.4.
915
+
916
+ Fix: `AgentState` now has `lastResult?: TaskResult`. `executeTask()`
917
+ snapshots the result onto `state.lastResult` immediately before
918
+ resolving. Cleared at task start so pollers can't read stale data
919
+ while a new task is in flight. Test coverage: 4 new tests at
920
+ `src/__tests__/agent-last-result.test.ts`.
921
+
922
+ ### Fixed — silent screenshots on macOS 14+ via ScreenCaptureKit (PR #109)
923
+
924
+ macOS 26 Tahoe added a "screen captured" white-flash animation that
925
+ fires whenever any process hits the screencapture coordinator daemon —
926
+ including the deprecated `CGWindowListCreateImage` API our
927
+ `ScreenshotHelper` was using. For an agent tool that screenshots
928
+ dozens of times per session, every flash was both visually disruptive
929
+ and a privacy signal users didn't need to see for legitimate
930
+ automation.
931
+
932
+ New `captureFullScreenSCK` + `captureWindowSCK` functions use
933
+ ScreenCaptureKit (macOS 14+) which Tahoe's flash hook does NOT
934
+ intercept. JSON output shape preserved byte-for-byte; deployment
935
+ target stays `.macOS(.v12)` via runtime version gate. Falls back to
936
+ the existing CG path on macOS 12-13 where CG is still silent.
937
+
938
+ ### Added — `prepare` script for clean npm publish
939
+
940
+ `package.json` now has `prepare: tsc && node dist/postbuild.js`. The
941
+ npm `prepare` lifecycle runs on `npm pack` / `npm publish`, so the
942
+ published tarball always reflects the current source rather than
943
+ shipping a stale `dist/` from the developer's last `npm run build`.
944
+
945
+ ### Fixed — installer no longer destroys user state on dirty tree (PR #108, backfilled to v0.9.5)
946
+
947
+ The `irm https://clawdcursor.com/install.ps1 | iex` and equivalent
948
+ `curl … | bash` paths previously did a `git checkout && git pull` and,
949
+ on any non-zero exit, ran `rm -rf $INSTALL_DIR` and re-cloned from
950
+ scratch. Any uncommitted work in the user's tree — feature branches,
951
+ dirty edits, untracked scratch files — was destroyed with no consent
952
+ and no recovery path. The error message also lied about the cause: a
953
+ dirty tree, a missing ref, or a diverged branch all surfaced as
954
+ "Download failed. Check your internet and try again."
955
+
956
+ Both installers now refuse to update a dirty tree, surface the real
957
+ `git` stderr on failure, and never delete `$INSTALL_DIR` without
958
+ explicit user action. `install.ps1` also dropped UTF-8 em-dashes in
959
+ comments to fix a Windows-PowerShell-5.1 ANSI-decoding parser issue.
960
+
961
+ ### Notes
962
+
963
+ - **macOS users installing via `npm i -g clawdcursor`**: the Swift
964
+ native helper (ClawdCursor.app) isn't pre-built in the npm tarball.
965
+ After install, run `cd $(npm root -g)/clawdcursor && bash native/build.sh && clawdcursor grant`
966
+ to build it. Or use the existing `irm | iex` installer which handles
967
+ this automatically. Fixing the npm-direct macOS path is on the
968
+ v0.9.6 list.
969
+ - Closed PR #94 (diagram improvements) — its scope was a subset of
970
+ #93's; the diagram updates folded in via the rebase.
971
+
972
+
973
+ ## [0.9.4] - 2026-05-20 — external-agent reliability + browser DOM reachability
974
+
975
+ Two threads of work landed: a batch of reliability fixes surfaced by
976
+ an end-to-end live test (Sonnet driving clawdcursor over MCP-HTTP
977
+ against the public benchmark exam at clawdcursor.com/tests), and the
978
+ first round of fixes to the external-agent UX gap that test exposed.
979
+
980
+ ### Live test summary
981
+
982
+ The exam at `192.168.1.127:8000` (14 desktop-control tasks: clicks,
983
+ drags, hover, double/right-click, typing, scroll-to-find, bezier path,
984
+ keyboard combo, multi-step workflow) was passed end-to-end by Sonnet
985
+ driving the compact MCP surface. Three runs:
986
+
987
+ - baseline (no hierarchy prompt): grade A, 39 screenshots, 2 a11y calls
988
+ - hierarchy prompted (no CDP fallback yet): grade A, 39 screenshots, 0 a11y successes — proved the underlying tools were canvas-blind
989
+ - post-CDP-fallback + `--compact`: ~20 CDP DOM hits including ★TARGET in the scroll-to-find task (saved ~285 wheel-scroll calls)
990
+
991
+ ### Added — `clawdcursor agent --compact` (PR #106)
992
+
993
+ Previously the 6-compound MCP surface (`computer`, `accessibility`,
994
+ `window`, `system`, `browser`, `task`) was only reachable via
995
+ `clawdcursor mcp --compact` (stdio, for editor integrations). The
996
+ HTTP-MCP daemon at `:3847/mcp` was hard-coded to serve all 97 granular
997
+ tools — which silently broke the README's "6 compact tools" pitch for
998
+ any external agent connecting over HTTP. `clawdcursor agent --compact`
999
+ (or `CLAWD_MCP_COMPACT=1`) now exposes the same compound surface over
1000
+ HTTP. Default stays granular because the daemon dashboard at `/` calls
1001
+ 9 granular tool names directly (`scheduled_task_*`, `agent_status`,
1002
+ `submit_task`, `favorites_*`, `logs_recent`) — flipping the default
1003
+ will follow once those calls migrate to the compound `system` action
1004
+ vocabulary.
1005
+
1006
+ ### Added — CDP DOM fallback in `find_element` + `read_screen` (PR #107)
1007
+
1008
+ Edge / Chrome UIA trees stop at browser chrome — single-page apps and
1009
+ in-page DOM widgets are invisible to pure UIA queries. When the focused
1010
+ window is a recognised browser and clawdcursor's CDP driver is
1011
+ connected, `find_element` and `read_screen` now also query the DOM via
1012
+ `document.querySelectorAll('a, button, input, …, [aria-label], [role]')`
1013
+ and fold the matches into the response. `find_element` flags CDP
1014
+ results with a `(via CDP DOM; coords are viewport-relative)` header;
1015
+ `read_screen` appends a `BROWSER DOM` section side-by-side with the
1016
+ UIA tree. The smart-layer (`smart_click` / `smart_read` / `smart_type`)
1017
+ already had this fallback; the granular tools that external agents
1018
+ prefer when explicitly told "a11y first" did not. Now they do.
1019
+
1020
+ **Known limit.** CDP DOM only sees standard HTML elements. Canvas-
1021
+ rendered content (shapes drawn via 2D context or WebGL) remains
1022
+ vision-only and requires `computer.screenshot` + pixel coordinates.
1023
+ This is a platform limit, not a tool limit — `querySelectorAll` cannot
1024
+ enumerate pixels.
1025
+
1026
+ ### Fixed — pipeline ladder climbs past rung LLM errors (PR #104)
1027
+
1028
+ `src/core/pipeline.ts` previously treated any "aborted" failure string
1029
+ as a hard user-abort, so a transient LLM timeout on the blind rung
1030
+ collapsed the whole chain — vision was effectively dead code on slow
1031
+ or flaky providers. Replaced the stringly-typed branch with a
1032
+ `RungFailureCategory` tagged-union (`user_abort` / `rung_llm_error` /
1033
+ `agent_gave_up` / `verifier_rejected` / `config_missing` /
1034
+ `anti_pattern` / `infra_error`) and a `categorizeFailureReason` mapper
1035
+ as the single source of truth. Chain-abort gate hard-aborts only on
1036
+ `user_abort`, `infra_error`, `anti_pattern`, or high-confidence
1037
+ `verifier_rejected`; everything else escalates to the next rung.
1038
+
1039
+ Verified live: pointing the daemon at an unreachable LLM URL produced
1040
+ `blind → hybrid → vision` rung attempts where the previous chain-abort
1041
+ gate stopped after rung 1. Also fixed a related phantom-success bug
1042
+ where aggregate accounting could mark a task `success: true` when every
1043
+ rung had failed with `rung_llm_error`. 4 integration tests +
1044
+ 7 mapper unit tests added at `src/__tests__/pipeline-chain-abort.test.ts`.
1045
+
1046
+ ### Fixed — blind-mode coordinate-click guardrail (PR #103)
1047
+
1048
+ The autonomous agent's blind rung (a11y-only, no screenshots) was
1049
+ emitting raw `mouse_click(x, y)` calls with hallucinated coordinates
1050
+ when the a11y tree didn't contain the LLM's target — a live test
1051
+ observed it walking through an exam UI by guessing positions until the
1052
+ verifier's 0.65-confidence rejection finally fired. New block at
1053
+ `src/core/agent-loop/agent.ts:531-587`: when `mode === 'blind'` and no
1054
+ a11y-aware selector (`invoke_element`, `set_field_value`,
1055
+ `focus_element`, `a11y_select`, `a11y_toggle`, `a11y_expand`,
1056
+ `a11y_collapse`, `wait_for_element`, `find_element`) succeeded in the
1057
+ prior 2 turns, raw coordinate clicks are refused with a structured
1058
+ tool-result that points the LLM at the recovery options
1059
+ (`cannot_read` or `screenshot`). 4 regression tests at
1060
+ `src/__tests__/blind-coord-click-guard.test.ts`.
1061
+
1062
+ ### Fixed — CLI `--text-model` / `--api-key` / `--base-url` ignored (PR #105)
1063
+
1064
+ The boot banner read these flags through `resolveConfig`
1065
+ (`src/llm/config.ts:203`) and proudly printed
1066
+ `Using externally configured models: text=X`, but the runtime agent
1067
+ loop read from `loadPipelineConfig` (`src/surface/doctor.ts:1636`)
1068
+ which only consulted `.clawdcursor-config.json` — so the very next log
1069
+ line was `pipeline.start … models=text=off`. `loadPipelineConfig` now
1070
+ accepts an optional `ResolvedConfig` overlay; fields tagged
1071
+ `source === 'cli'` override disk values. Precedence preserved
1072
+ (CLI > project > user > env > autodetect > default). The contradictory
1073
+ double banner (`No AI providers found` immediately followed by
1074
+ `Using externally configured models`) is also gone — the
1075
+ auto-detection branch is skipped when CLI flags already supply LLM
1076
+ wiring. 5 regression tests at
1077
+ `src/__tests__/load-pipeline-config-overlay.test.ts`.
1078
+
1079
+ ### Fixed — `smart_click` candidates + macOS multi-window + open_url tier + a11y description fallback (PR #102, closes #101)
1080
+
1081
+ Four issues from issue #101:
1082
+
1083
+ - `smart_click` now returns a structured failure payload
1084
+ `{error, reason, target, candidates, tried, elapsedMs, isError: true}`
1085
+ instead of bare timeout strings. Callers that hit an ambiguous target
1086
+ can disambiguate from the candidate list; deadline-aware budget
1087
+ replaces the bare `Promise.race` that previously swallowed diagnostic
1088
+ state. New tests at `src/__tests__/smart-tools.test.ts`.
1089
+
1090
+ - macOS `focus_window` now disambiguates among multiple windows of the
1091
+ same process by title — `scripts/mac/_window-picker.jxa` plus a
1092
+ `scoreWindow()` heuristic that deprioritises tray-style popovers
1093
+ (Xcode "Downloads", etc.).
1094
+
1095
+ - `open_url` was filtered out of the act-only safety tier; the
1096
+ `safetyTier: 2 → 1` change in `src/tools/extras.ts:523` restores it.
1097
+
1098
+ - A11y element labels now fall back through `name → description →
1099
+ value → ''` so macOS apps that put their visible text in
1100
+ `AXDescription` (Xcode, others) render with something meaningful
1101
+ instead of `"missing value"`. `formatElement()` helper in
1102
+ `src/tools/a11y.ts:25-30`.
1103
+
1104
+ ### Repo hygiene
1105
+
1106
+ Closed security-audit issue #13 with the per-commit fix-mapping comment.
1107
+ Rejected SafeSkill scanner PR #92 (the 20/100 "Blocked" badge was
1108
+ based on a heuristic that flags ANSI terminal color escapes as
1109
+ obfuscated content — see `src/surface/cli.ts`, `src/surface/doctor.ts`,
1110
+ etc. for the 58+ legitimate ANSI escapes). Closed issue #101.
1111
+
1112
+ Five dependabot bumps landed: `tsx` 4.21→4.22, `ws` 8.20.0→8.20.1
1113
+ (security patch), `croner` 9→10 (major, breaking change does not
1114
+ affect this codebase — only `?` wildcard semantics changed),
1115
+ `eslint` group +3 updates, `@types/node` 25.7→25.9.
1116
+
1117
+
1118
+ ## [0.9.3] - 2026-05-16 — tool-layer fixes + live-test report
1119
+
1120
+ Three critical tool-layer fixes surfaced by a deep audit + a Windows
1121
+ encoding bug spotted during an end-to-end live test (run by an LLM
1122
+ driving the compact MCP surface from Claude Code). Also: README hero
1123
+ no longer leads with "fallback only" framing — that discipline stays
1124
+ in SKILL.md (where it belongs for AI agents) and in a new "When NOT
1125
+ to use it" section in the README body.
1126
+
1127
+ ### Fixed — Linux SIGSEGV on MCP stdin teardown (carried from `3fc76b8`)
1128
+
1129
+ Calling `process.exit()` synchronously inside a stdin `'end'` event
1130
+ handler segfaulted on Linux because libuv was still unwinding the
1131
+ stream read handle. `releaseMcp` now guards against double-fire and
1132
+ defers exit via `setImmediate`. Fixes the cross-platform CI on
1133
+ ubuntu-latest (Node 20 + 22).
1134
+
1135
+ ### Fixed — `navigate_browser` PowerShell shell injection (Win32 branch)
1136
+
1137
+ `src/tools/orchestration.ts` interpolated the URL into a
1138
+ `Start-Process … -ArgumentList @(…,"${url}")` PowerShell command. A URL
1139
+ containing `")` or `$()` or backticks could escape the quoting and
1140
+ execute arbitrary PowerShell. Replaced with a direct `execFile()`
1141
+ against `msedge.exe` resolved from standard install locations — no
1142
+ shell shim, argv is safe. macOS and Linux branches already used
1143
+ argv-form `execFile` and were not affected.
1144
+
1145
+ ### Fixed — `screenshot_full` MIME type lied
1146
+
1147
+ `src/tools/agent.ts` declared `mimeType: 'image/png'` and described the
1148
+ output as base64 PNG, but `captureForLLM()` returns JPEG by default
1149
+ (or PNG only when `CLAWD_SCREENSHOT_FORMAT=png`). Any client that
1150
+ decoded the bytes by the advertised type silently corrupted. The
1151
+ `image.mimeType` field now follows the actual `frame.format`; a new
1152
+ `format` field in the metadata block lets clients double-check.
1153
+
1154
+ ### Fixed — `learn_app` silent no-op
1155
+
1156
+ The handler returned `{saved: true}` even when neither save branch
1157
+ executed (e.g., the caller supplied only `processName`). Now tracks
1158
+ `wroteLesson`/`wroteGuide` flags and returns
1159
+ `{saved: false, reason: …, isError: true}` when nothing was persisted.
1160
+ New regression-guard test at `agent-tools.test.ts`.
1161
+
1162
+ ### Fixed — Windows window-title UTF-8 corruption
1163
+
1164
+ Confirmed live: every `window.list`/`window.active` call returned
1165
+ non-ASCII characters in window titles as `?` or `�` (the Unicode
1166
+ replacement character). Root cause: `scripts/ps-bridge.ps1` and
1167
+ `scripts/ocr-recognize.ps1` did not set `[Console]::OutputEncoding`,
1168
+ so PowerShell wrote in the system code page (Windows-1252 in most
1169
+ locales) while Node decoded as UTF-8. Both scripts now force UTF-8 on
1170
+ stdin/stdout and `$OutputEncoding`. Same fix benefits OCR text capture
1171
+ of non-ASCII content (emoji, accented characters, CJK).
1172
+
1173
+ ### Fixed — compact `direction` enum dropped `scroll_horizontal` values
1174
+
1175
+ `buildCompoundSchema` in `src/tools/compact.ts` was first-wins on
1176
+ field names across delegates: `mouse_scroll` declared
1177
+ `direction: ['up','down']` first and won, so `mouse_scroll_horizontal`'s
1178
+ `['left','right']` was silently invisible on the compact surface. An
1179
+ LLM calling `computer({action:'scroll_horizontal', direction:'left'})`
1180
+ was violating the published schema. The merge now unions enum values
1181
+ across delegates.
1182
+
1183
+ ### Improved — `task` and `delegate_to_agent` descriptions lead with the daemon requirement
1184
+
1185
+ Both tools return ECONNREFUSED (or "no agent") when called from a
1186
+ stdio MCP host (Cursor, Claude Code, Windsurf) because they HTTP-call
1187
+ `127.0.0.1:3847/mcp` on the daemon. Their descriptions now lead with
1188
+ **Requires the `clawdcursor agent` daemon to be running** and tell
1189
+ the consumer how to start it.
1190
+
1191
+ ### Repositioned — README hero
1192
+
1193
+ The "Use as a fallback, not first choice" callout no longer sits in
1194
+ the README hero. The same discipline stays in SKILL.md (the AI-facing
1195
+ manual, where it correctly disciplines agent behavior) and in a new
1196
+ "When NOT to use it" subsection inside README's `Why Clawd Cursor`
1197
+ block. The hero now leads with what it does. SKILL.md frontmatter is
1198
+ unchanged — it still leads with the strict 4-gate for agent
1199
+ consumers.
1200
+
1201
+ ### Added — live test report
1202
+
1203
+ `docs/internal/0.9.2-live-test-2026-05-16.md` documents a full
1204
+ end-to-end test of clawdcursor 0.9.2, run by an LLM consuming the
1205
+ compact MCP surface from Claude Code. Covers every compact compound,
1206
+ the HTTP MCP transport via a parallel daemon, what worked, what
1207
+ surprised, what's broken. Reference artifact for the trust story —
1208
+ something a curious visitor can read to see "yes, this has been
1209
+ actually tested by an AI agent driving a real desktop."
1210
+
1211
+ ### Internal — security audit reply draft
1212
+
1213
+ `docs/internal/issue-13-reply-draft.md` is a draft response to the
1214
+ long-open security audit issue, listing what has landed in 0.9.x to
1215
+ address each item. Maintainer reviews + edits + posts to GitHub.
1216
+
1217
+ ### Test coverage
1218
+
1219
+ 51 test files, 813 tests pass (was 812 — `+1` new regression guard for
1220
+ `learn_app`'s no-payload case). Typecheck clean. Lint stable at 18
1221
+ pre-existing warnings.
1222
+
1223
+ ## [0.9.2] - 2026-05-15 — reliability + scanner-friendliness
1224
+
1225
+ Multiple fixes and a refactor consolidated into one release.
1226
+
1227
+ ### Fixed — recycled-PID false positives in single-instance lock
1228
+
1229
+ User-reported on Windows 11 + Claude Code: `/mcp` reconnect
1230
+ intermittently failed with `Failed to reconnect to clawdcursor: -32000`,
1231
+ and once it broke, every subsequent reconnect failed too — until the
1232
+ user manually killed zombie node processes and
1233
+ `rm ~/.clawdcursor/mcp.pid`.
1234
+
1235
+ `isProcessAlive(pid)` used `process.kill(pid, 0)`, which on Windows is
1236
+ fooled by PID recycling: once the dead clawdcursor's PID was reassigned
1237
+ to any other live process (chrome, svchost, anything), the lockfile
1238
+ permanently looked "live" and refused all future spawns. The lockfile
1239
+ also stored only a bare integer PID, leaving no way to disambiguate.
1240
+
1241
+ `~/.clawdcursor/{start,mcp,serve}.pid` is now JSON with schema version,
1242
+ PID, **process start time**, and mode. `claimPidFile` requires the
1243
+ recorded start time to match the OS-reported start time of the live PID
1244
+ (±5 s tolerance for OS reporting jitter) before treating it as a real
1245
+ duplicate. Implementation extracted to `src/surface/pidfile.ts` with
1246
+ unit-test coverage. Legacy bare-integer lockfiles are treated as stale
1247
+ on first read (silent backwards-compat — the old format can't be
1248
+ trusted anyway).
1249
+
1250
+ ### Fixed — orphan MCP processes block reconnect
1251
+
1252
+ When an editor host exited without reaping its `clawdcursor mcp` child,
1253
+ the orphan kept running with no usable stdio but legitimately matched
1254
+ the lockfile. The `mcp` command now treats stdin EOF / close / error as
1255
+ a hard exit signal: when the parent's stdio pipe closes, the orphan
1256
+ releases its lockfile and exits cleanly. Deterministic on every
1257
+ platform — no polling, no parent-PID inspection.
1258
+
1259
+ ### Fixed — `clawdcursor uninstall` silently failed to kill running processes
1260
+
1261
+ The uninstall command's pidfile fallback (`src/surface/cli.ts`) still
1262
+ parsed the lockfile with `parseInt`, which against the new JSON format
1263
+ (`{"v":1,...}`) returns `NaN`, silently skipping the kill. A user
1264
+ running `clawdcursor uninstall` while a clawdcursor process was alive
1265
+ would end up with deleted config + orphaned process. Now uses the
1266
+ shared `readPidLoose` helper that handles both new JSON and legacy
1267
+ bare-int formats.
1268
+
1269
+ ### Fixed — dashboard credential redaction silently broken since 0.7.x
1270
+
1271
+ `looksLikeCredential` in `src/surface/dashboard.ts` is supposed to
1272
+ hide password-shaped strings (`password: secret`, `Bearer xxxx`, etc.)
1273
+ from the task-history UI. The patterns were declared inside an outer JS
1274
+ template literal, so the single backslashes in `\s` and `\S` were
1275
+ silently dropped at parse time — the runtime regex matched literal `s`
1276
+ and `S` characters instead of whitespace. **No password the regex was
1277
+ designed to catch was actually being caught.** Patterns now use `\\s` /
1278
+ `\\S` in source so the emitted JS gets the correct escapes; verified
1279
+ end-to-end with a runtime regex eval.
1280
+
1281
+ ### Refactor — migrate ANSI escape codes to picocolors
1282
+
1283
+ Replaced 58 inline `\x1b[NNm` ANSI styling literals across
1284
+ `src/surface/{cli,doctor,onboarding,readiness}.ts` and
1285
+ `src/core/observability/logger.ts` with `picocolors` calls. Same visual
1286
+ output (picocolors emits the same standard ANSI codes at runtime, with
1287
+ semantic close codes — `[22m` for bold-off, `[39m` for color-default —
1288
+ instead of heavy-handed `[0m` everywhere, which actually composes
1289
+ better when colors nest).
1290
+
1291
+ Motivation: third-party static analyzers (SafeSkill etc.) flagged
1292
+ inline `\x1b` hex escapes as "potentially obfuscated content" — a
1293
+ malware-detection heuristic that doesn't account for the fact that any
1294
+ CLI with colored output uses exactly that syntax. Routing through
1295
+ picocolors moves the escape codes into a vetted dependency, so source
1296
+ scanners no longer see them as suspicious literals. Added
1297
+ `picocolors@^1.1.1` (zero-deps, ~3 KB).
1298
+
1299
+ The logger's `C` color table is now keyed to picocolors style
1300
+ functions instead of raw escape strings; `colorize`, `layerTag`,
1301
+ `mapStrategyTag` updated accordingly. The ANSI-stripping regex in
1302
+ `pad()` is built from `String.fromCharCode(27)` instead of `\x1b` so
1303
+ the source itself carries no hex escape.
1304
+
1305
+ Platform-layer control-char sanitization regexes (`/[\r\n\t\x00-\x1f]/`)
1306
+ in `src/platform/*.ts` are intentionally **not** migrated — those are
1307
+ input filters, not styling, and aren't what static analyzers were
1308
+ flagging as critical.
1309
+
1310
+ ### Docs — SKILL.md frontmatter leads with FALLBACK ONLY
1311
+
1312
+ The frontmatter `description` field — what skill registries and AI
1313
+ tool indexes display before an agent opens the file — now leads with
1314
+ "FALLBACK ONLY" + the explicit numbered 4-gate (native API → CLI →
1315
+ file edit → existing browser automation), instead of the softer "skill
1316
+ of last resort that gives AI agents eyes…" wording that front-loaded
1317
+ the capability claim. The body content already had the same 4-gate
1318
+ (lines 46–54 and 197–208); this aligns the frontmatter with that body
1319
+ messaging. PR #95.
1320
+
1321
+ ### Internal — release-time version sync
1322
+
1323
+ `scripts/sync-version.ts` reads `package.json` at release time and
1324
+ propagates the version into `SKILL.md` frontmatter, `docs/index.html`
1325
+ hero/footer, and the install script header pins. Wired into npm's
1326
+ `version` lifecycle hook so `npm version <bump>` updates everything
1327
+ in one shot. Removes drift opportunity between `package.json` and the
1328
+ website / SKILL frontmatter that previously had to be hand-synced.
1329
+
1330
+ ### Internal — tool-count cleanup
1331
+
1332
+ User-visible runtime output and the marketing site previously claimed
1333
+ 89 or 93 tools in places where the actual catalog was 97. `doctor.ts`
1334
+ post-success panel and `docs/index.html` hero/spec/mode-stats now match
1335
+ the registry. Historical "What's new" entries (e.g. v0.9.0's "89
1336
+ granular + 6 compact") are left as-is — they're accurate to the
1337
+ release they describe.
1338
+
1339
+ ### Migration
1340
+
1341
+ No action needed for fresh installs. A user already on a broken
1342
+ PID-lock state should update, then a single `rm ~/.clawdcursor/mcp.pid`
1343
+ (or `clawdcursor stop`) clears the legacy lockfile the prior version
1344
+ left behind. From then on the new code self-heals.
1345
+
1346
+ ## [0.9.1] - 2026-05-14 — compose-send fix + scheduled tasks
1347
+
1348
+ A user-reported regression on macOS plus a long-missing daemon feature. No
1349
+ breaking changes; safe upgrade from v0.9.0.
1350
+
1351
+ ### Fixed — compose-send playbook (real user-reported bug)
1352
+
1353
+ A v0.9.0 user on macOS asked "open mail app and send an email to X
1354
+ introducing yourself." The trace reported `✅ done · path=playbook · 2/2
1355
+ subtasks · $0.0000`, but the actual send was broken: the body landed in
1356
+ the wrong field (and/or merged with the subject field). **No LLM was
1357
+ called and no vision fallback ever fired** — the bug was 100% in the
1358
+ deterministic playbook plus a verifier bypass that let the playbook
1359
+ self-certify. Three layered fixes:
1360
+
1361
+ - **Platform-aware Tab count after recipient** in
1362
+ `src/tools/playbooks/compose-send.ts`. The previous code fired TWO Tabs
1363
+ after typing the recipient, assuming every mail app shows Cc/Bcc inline.
1364
+ macOS Mail.app's default layout has Cc/Bcc collapsed — Tab order is
1365
+ `To → Subject → Body`. Two Tabs overshot Subject and landed on Body.
1366
+ New: 1 Tab on darwin/linux, 3 Tabs on win32 (Outlook desktop default),
1367
+ via a `tabsAfterRecipient()` helper. Documented per-platform in the
1368
+ module header.
1369
+ - **Decoupled the post-subject Tab from `if (subject)`**. The advance to
1370
+ Body now fires unconditionally so a task with no explicit subject (the
1371
+ user's "introducing yourself" case) still lands the body in the right
1372
+ field instead of typing it into whatever the previous Tab happened to
1373
+ leave focus on.
1374
+ - **Removed playbook exemption from the verifier** in
1375
+ `src/core/pipeline.ts:649-655`. The router exemption stays (router has
1376
+ its own window-list-diff evidence). Playbooks now go through the
1377
+ ground-truth verifier like every other rung — the rich `send_email`
1378
+ task assertions (`compose_closed` via full window list, `recipient_visible`,
1379
+ `not_just_saved_as_draft` anti-signal) were designed for exactly this
1380
+ bug class but couldn't catch it because they never ran. Verifier is
1381
+ <500ms; soft-fail-on-low-confidence policy stays in place for legitimate
1382
+ idempotent operations.
1383
+ - **Better summary line**: `compose-send: to=… subject=… body=…ch
1384
+ tabs-after-to=…` now reports parsed field state and platform Tab count
1385
+ in the trailing PIPELINE_DONE line. Empty subject was the original
1386
+ diagnostic signal in the user-reported bug — now it's visible at a
1387
+ glance.
1388
+
1389
+ ### Added — Scheduled tasks (new feature, requested)
1390
+
1391
+ Cron-driven recurring tasks that fire through the same agent pipeline as
1392
+ `submit_task`. Persisted across daemon restarts. **Dashboard gets a new
1393
+ ⏰ Scheduled tab** with cron + task inputs, an active-schedule list, and
1394
+ per-row pause / delete buttons.
1395
+
1396
+ - **`src/tools/scheduler.ts`** — 4 new MCP tools:
1397
+ - `scheduled_task_create({ task, cron, tz? })` — validates the cron up
1398
+ front (`croner`), persists, registers an in-process cron job that
1399
+ dispatches via `agent.executeTask`.
1400
+ - `scheduled_task_list()` — returns every persisted task with run /
1401
+ skip / lastError counters and a computed `nextRun` ISO timestamp.
1402
+ - `scheduled_task_delete({ id })` — unregisters + removes from disk.
1403
+ - `scheduled_task_toggle({ id, enabled })` — pause/resume without
1404
+ deleting; disabled tasks stay persisted but their cron job is
1405
+ unregistered.
1406
+ - **Storage**: `~/.clawdcursor/scheduled-tasks.json`. Path is computed
1407
+ dynamically (honors `CLAWD_HOME`) so tests and forks can redirect.
1408
+ - **Reentrancy**: if a tick fires while the agent is busy, the task is
1409
+ skipped and `skipCount` increments. No queue, no pile-up. Predictable.
1410
+ - **Boot lifecycle**: `clawdcursor agent` calls `initScheduler(agent)` on
1411
+ startup (only when an LLM is configured — the scheduler requires the
1412
+ autonomous agent to dispatch into). Daemon shutdown calls
1413
+ `stopScheduler()` to cleanly unregister all jobs.
1414
+ - **Auth**: every scheduler tool sits behind the same bearer-token gate
1415
+ as the rest of the MCP HTTP surface (`/mcp` already wraps `requireAuth`).
1416
+ - **Dependency**: adds `croner@^9.1.0` (zero-dep cron parser, ~7 KB).
1417
+
1418
+ ### Stats
1419
+
1420
+ - Tool count: **89 → 93** (+4 scheduled_task_* tools)
1421
+ - Tests: **759 → 776** (+5 playbook tests + 14 scheduler tests, all green)
1422
+ - Schema snapshot regenerated.
1423
+
1424
+ ### Migration
1425
+
1426
+ None. Drop-in upgrade from v0.9.0.
1427
+
1428
+ ---
1429
+
1430
+ ## [0.9.0] - 2026-05-14 — Architecture redesign + guides marketplace
1431
+
1432
+ The largest release since v0.7. Net change vs v0.8.17: **−10,200 LOC, +14 new MCP tools, one protocol instead of two, five directories instead of seven**, plus a Reflector feedback channel that closes the loop between verifier signals and planner decisions, plus a public guides marketplace where community-contributed app knowledge ships independently of the binary.
1433
+
1434
+ ### Architectural rewrite
1435
+
1436
+ - **One protocol, two transports.** REST surface (`/task`, `/tools`, `/execute/:name`, `/favorites`, `/learn`, `/screenshot`, `/abort`, `/confirm`, `/logs`, `/task-logs`) is gone. Every former REST endpoint is now an MCP tool. The HTTP daemon serves stateless MCP at `POST /mcp` alongside `/health`, `/stop`, and `/` (dashboard).
1437
+ - **Five directories under `src/`.** `core/` (agent loop + pipeline + verifier + safety + skills), `tools/` (one registry, 89 granular + 6 compound), `platform/` (Windows / macOS / Linux X11 / Linux Wayland adapters + Swift host app), `llm/` (providers + credentials + knowledge), `surface/` (CLI + MCP server + dashboard). One concern per directory, no upward dependencies.
1438
+ - **Legacy cascade removed.** The v0.7-era cascade (`computer-use.ts`, `ai-brain.ts`, `action-router.ts`, `generic-computer-use.ts`, 14 more modules — ~12 k LOC) deleted along with the `--legacy` flag and `_executeTaskInternal`. Tag `v0.8.17-legacy` preserves the cascade for emergency cherry-pick.
1439
+ - **CLI verb rename.** `clawdcursor start` → `clawdcursor agent`; `clawdcursor serve` → `clawdcursor agent --no-llm`. Old verbs still work as deprecation aliases through 0.9.x; removed in 0.10.
1440
+
1441
+ ### Reflector feedback (CLAWD_REFLECTOR=1)
1442
+
1443
+ The verifier now produces structured `ReflectionFeedback` with typed `Cause[]` and an optional `suggestedStrategy`. Six cause kinds: `no_pixel_change`, `wrong_window_focused`, `modal_intercept`, `a11y_target_missing`, `webview_blind`, `partial_text_match`. The pipeline ladder reroutes based on the dominant cause instead of just rolling down — `webview_blind` jumps straight to vision, `modal_intercept` retries after dismissal. Behind a feature flag for one cycle; default-on in 0.9.1 if telemetry is positive.
1444
+
1445
+ ### Safety + correctness
1446
+
1447
+ - **Five tools promoted to Tier 2 (mutation)** after an external audit: `open_file`, `open_url`, `open_uri`, `navigate_browser`, `write_clipboard`. Each can trigger arbitrary OS handlers, network egress, or clipboard hijack — Tier 1 understated the risk.
1448
+ - **Sensitive-app safety gate now actually elevates** instead of just logging. Clicking inside Outlook / 1Password / Mail / banking / private-messaging with no target label → `confirm` (not `allow`).
1449
+ - **App-pattern data consolidated** into `src/core/app-categories.ts`. Single source of truth for the WebView2 settle list + sensitive-app list. The autonomous pipeline never imports it.
1450
+ - **Stateless MCP HTTP transport.** Per-request transport lifecycle, `enableJsonResponse: true` so clients receive plain JSON-RPC instead of SSE event-stream framing they choke on.
1451
+
1452
+ ### Agent-loop reliability
1453
+
1454
+ - **Soft-fail subtask policy.** Low-confidence verifier rejection (< 0.5) on a single subtask logs a warning and continues. Idempotent operations like "create new canvas" after `open_app("Paint")` (pixel-change zero because Paint already opened blank) no longer kill the chain at subtask 2.
1455
+ - **Runaway guard on consecutive no-tool-call turns.** Three turns of degenerate model output (e.g. Kimi hitting `max_tokens` with token-loop garbage) trigger a clean rung exit instead of burning the full 5-minute task timeout.
1456
+ - **Kimi `moonshot-v1-*` prose-tool-call parser updated** for the new `functions.NAME:N->{_{...}}` format the model now emits.
1457
+ - **Per-task PIPELINE_DONE footer always fires** with `success/failed (reason) · path · N/M subtasks · $cost · duration`. Was missing on chain-abort + isAborted paths.
1458
+ - **DPI mouse-scale fix.** Both stdio MCP and `clawdcursor agent` now use `physical/image` as the mouseScaleFactor source. Vision-driven clicks land where intended on HiDPI Windows / Retina macOS instead of being 2× too far towards top-left.
1459
+ - **DPI info injected into agent prompt** so models that try to "help" by self-scaling don't pre-multiply.
1460
+
1461
+ ### Tools
1462
+
1463
+ - **Tool count 75 → 89.** Fourteen new MCP tools absorbed the former REST endpoints + the marketplace surface: `submit_task`, `abort_task`, `agent_status`, `screenshot_full`, `favorites_list/_add/_remove`, `task_logs_list/_current`, `logs_recent`, `learn_app`, `submit_report`, plus two new guides-management entries.
1464
+ - **Tool registry unified.** Compact (6 compounds) is now a transform over the granular registry, not a parallel catalog. One source of truth, no drift.
1465
+ - **MCP `open_app` uses alias table + PlatformAdapter** instead of raw `Start-Process`. Calculator, Win11 Notepad, and other UWP apps work correctly.
1466
+ - **`focus_window` AND-matches** when given both pid + title — needed for Win11's tabbed Notepad where multiple windows share a pid.
1467
+ - **`type_text` preserves the user's clipboard** around its paste-as-type operation. Was silently clobbering.
1468
+
1469
+ ### Guides marketplace (new)
1470
+
1471
+ clawdcursor reasons about every app from screenshots and a11y trees. For popular apps that's slow. v0.9 ships a **marketplace of community-curated app guides** the agent fetches on demand, caches locally based on usage, and uses to operate apps 5–10× faster — without ever blocking the agent loop on the network.
1472
+
1473
+ - **Public registry at <https://clawdcursor.com/app-guides>**, backed by the GitHub repo <https://github.com/AmrDab/clawdcursor-guides>. PR-based submissions, native GitHub identity as anti-spam, vote-issues for ratings (`vote: <app>` issues with 👍/👎 reactions aggregated nightly into `index.json`).
1474
+ - **10 verified seed guides at launch**: gmail, outlook, slack, youtube (the rich-multi-task reference — 19 workflows, 36 shortcuts, 8 layout regions, 13 tips), figma, discord, excel, mspaint, olk (new Outlook), spotify. Maintainer trust labels: `trust:verified` / `trust:community` / `trust:experimental`.
1475
+ - **Three new client-side modules**:
1476
+ - `src/llm/knowledge/remote-loader.ts` — `fetchGuide(app)` with timeout, conditional GET via ETag, stale-while-revalidate.
1477
+ - `src/llm/knowledge/cache.ts` — LRU + TTL (7 days, 50 entries). `touchUsage` reorders LRU on every hit, so popular guides survive eviction even when not most-recently-fetched.
1478
+ - `src/llm/knowledge/guide-linter.ts` — defense-in-depth: schema validation + prompt-injection patterns + dangerous-prose detection runs on every guide before injection, regardless of source (bundled, cached, user-override). Failed guides drop to null — agent falls back to first-principles reasoning, never poisoned-knowledge.
1479
+ - **Bundled core trimmed to 2 guides** (msedge + notepad — Windows defaults that ship with every install). The other 10 curated guides moved to `seed-registry/guides/` and uploaded to the GitHub repo. Lighter binary; guides update independently of releases.
1480
+ - **`clawdcursor guides` CLI rewritten**: `list`, `info <app>`, `available`, `install <app>` / `install --all`, `refresh <app>`, `remove <app>`, `clean`, `lint <file>`, `submit <file>` (lints + prints PR instructions).
1481
+ - **Preprocessor fires `prefetchGuideForApp(app)` async** the moment it detects an active window — by the next task, the cache is warm. First-touch uses whatever's local; subsequent tasks are fast.
1482
+ - **`learn_app` writes rerouted** to the user-override dir at `~/.clawdcursor/ui-knowledge/{app}.json` (was writing into the bundled source tree where the next install would clobber it). Auto-saves successful task patterns under `learnedWorkflows`; FIFO-capped at 20 per app.
1483
+ - **Rich prompt fragment renderer** (`renderAppKnowledge`): the agent now sees SHORTCUTS / WORKFLOWS (★-marked active one first) / LAYOUT / TIPS instead of just 8 comma-joined shortcuts. Cap 6000 chars with graceful degradation; non-active workflows truncated to 180 chars so a 20-workflow guide doesn't crowd out layout.
1484
+
1485
+ ### Router
1486
+
1487
+ - **Web-service redirect layer** (`src/core/router/web-services.ts`, 60-entry table). "open youtube" / "open reddit" / "open gmail" now redirects to `handleUrlNav('https://www.youtube.com')` via the OS default browser, instead of fall-through to Start-Menu search → blind-agent escalation. Closes a v0.9 failure mode where the agent typed the literal phrase "default browser" into a search bar. Native-client preference preserved: "open chrome" still launches the desktop client.
1488
+ - **System-context preamble** in the blind/hybrid agent system prompt (`src/core/agent-loop/prompt.ts` section 5c): web services → `open_url(URL)`, never type "browser" into search bars, don't emit "open chrome" before "navigate" unless explicitly named.
1489
+
1490
+ ### Verifier
1491
+
1492
+ - **`send_email` no longer falsely passes** when a popup steals foreground. Previous logic checked only `after.activeWindow.title` for compose-window absence — a banner popup focusing the agent's window inverted the check and the verifier reported success while Send was never clicked. Fix iterates the full `after.windows` list (`composeStillOpen = (after.windows ?? []).some(w => !w.isMinimized && composeKeywords.test(w.title))`). Also added: success-keyword detection (`message sent | email sent | sent successfully`), `not_just_saved_as_draft` anti-signal (rejects when "Draft saved" appears without success notice), expanded compose regex to include `reply`.
1493
+
1494
+ ### Doctor
1495
+
1496
+ - **Post-doctor "All systems go" panel rewritten** for clarity on the two access paths: MCP server for editor (`clawdcursor mcp`) gets 89 desktop tools (or 6 compound with `--compact`); HTTP daemon (`clawdcursor agent`) for unattended autonomy. Runtime-detects whether an LLM is configured and shows "(you have one)" green or "(none yet)" yellow.
1497
+
1498
+ ### Cross-platform integrity
1499
+
1500
+ - **All four OS adapters preserved.** Windows (1,220 LOC) + macOS (903 LOC) + Linux X11 (1,285 LOC) + Linux Wayland (343 LOC) — 3,751 LOC of adapter code, no regression from v0.8.
1501
+ - **macOS host app intact.** `ClawdCursorHost` Swift bundle, `permission-check`, `screenshot-helper`, `clawdcursor grant` flow — all preserved + path-resolution fixed (`getPackageRoot()`) so the host app is found correctly after the directory restructure.
1502
+
1503
+ ### Documentation
1504
+
1505
+ - **Professional README rewrite** (340 lines): hero badge row, Mermaid pipeline diagram with Reflector feedback edges, transport / cost-tier / cross-platform / compound-tool tables, 5-directory architecture summary. Modeled on `ollama`, `vercel/ai`, `microsoft/playwright`, `modelcontextprotocol/typescript-sdk`.
1506
+ - **Post-install + post-build banners are state-aware**: skip "Run consent" / "Run doctor" lines when the user already did them on a prior install.
1507
+ - **Two-path next-step routing** at install / consent / doctor: autonomous agent (`doctor` → `agent`) vs MCP-only (register `clawdcursor mcp` with editor host).
1508
+ - **SKILL.md reordered**: fallback discipline first, "no task impossible" confidence second, CAN/MUST/SHOULD third — load-bearing identity preserved verbatim.
1509
+ - **MACOS-SETUP, agent-guide, OPENCLAW-INTEGRATION-RECOMMENDATIONS, dashboard, website** all migrated from REST to MCP HTTP transport language.
1510
+ - **`docs/internal/v0.9-readme-building-blocks.md`** + **`docs/internal/agnostic-audit-report.md`** archived as design records (moved out of the published website root before release).
1511
+
1512
+ ### Release hygiene
1513
+
1514
+ - Removed orphan `docs/v0.7.5/` (v0.7-era landing page not linked anywhere).
1515
+ - `package.json` gains `repository`, `homepage`, `bugs`, `author`, `keywords`.
1516
+ - `.nvmrc` added (Node 20).
1517
+ - CI badge URL corrected to the actual workflow filename.
1518
+
1519
+ ---
1520
+
1521
+ ## [0.8.8] - 2026-05-05 — Reliability + correctness: mod modifier, compact set_value, smart_click foreground OCR, invoke-element timeout
1522
+
1523
+ A focused reliability release closing several real bugs surfaced by a production session (issue #71) and a thorough ultrareview of the v0.8.5 work. Two of the bugs were silent failures — the worst kind for an agent — and one was a hard hang in the standalone PowerShell scripts. Plus a routine round of major-version dependency bumps (express 5, commander 14, dotenv 17, sharp 0.34) and a lint cleanup pass.
1524
+
1525
+ ### Fixed
1526
+
1527
+ - **`mod` modifier now resolves correctly on every platform.** The legacy `NativeDesktop` (which `ctx.desktop` binds to in the granular tool registry) had no `mod` translation — only the v2 `PlatformAdapter` did. Calling `computer({"action":"key","combo":"mod+s"})` either threw `Unknown key: "mod"` (Win/Linux) or silently dropped the modifier and typed a literal `s` (macOS). Three coordinated fixes:
1528
+ - `src/keys.ts`: add `mod` to `KEY_ALIASES` resolved at module load to `Super` on darwin and `Control` elsewhere.
1529
+ - `src/native-desktop.ts:707-712`: extend the `macKeyPress` modifier loop to treat `mod` as `command down`. The loop did direct string comparison, so the alias alone wasn't enough.
1530
+ - `src/pipeline/playbooks/keys-blocklist.ts:14-22`: extend `normalizeCombo` so `mod+q` matches `cmd+q` on darwin (otherwise the safety gate would let `mod+q` quit-app through on macOS).
1531
+ - **Compact `accessibility({"action":"set_value", ...})` was broken.** `src/tools/compact.ts:93` delegated to `set_field_value`, but no granular tool by that name was registered (only the agent-internal palettes had it). Calls returned `{isError: true, text: "delegate not registered"}`. Registered the missing tool in `getA11yDepthTools()` mirroring `a11y_expand`/`a11y_toggle`. Tool count: 74 → 75. Schema snapshot regenerated.
1532
+ - **`smart_click` OCR matched text in background windows.** Full-screen OCR scoring iterated all elements and broke on the first exact match, so text in a non-focused window (e.g. Outlook visible behind a "Pick an account" dialog showing the same email) could win and cause a silent wrong-click. Refactored ranking into a `pickBest` helper that runs two passes: foreground-window first (using `activeWin.bounds`), full-screen only if foreground produced no match — with a `[WARNING: matched outside focused window]` annotation in the response so the agent has a signal to verify. From issue #71 review.
1533
+ - **`invoke-element.ps1` hung on React/Electron buttons that advertise InvokePattern but block on Invoke.** The legacy try/catch fallback chain (Invoke → Toggle → bounds) only fired when a pattern *threw*, not when one blocked indefinitely. Wrapped the pattern call in `System.Threading.Tasks.Task::Run` with a 2s `Wait(timeout)`. On timeout the script emits the same `success:false + clickPoint` JSON the existing catch produces. Direct callers of the script benefit; HTTP/MCP callers were already protected by `smart_click`'s 10s outer timeout. From issue #71.
1534
+ - **OpenClaw install metadata used `npm install -g clawdcursor`** but the package isn't published to npm (registry returns 404). OpenClaw following `metadata.openclaw.install` step 1 verbatim would abort before reaching `clawdcursor consent --accept`. Replaced with the documented `curl -fsSL https://clawdcursor.com/install.sh | bash` path that matches every other install surface.
1535
+
1536
+ ### Changed
1537
+
1538
+ - **Major dependency bumps**, all CI-green across the cross-platform matrix:
1539
+ - `express` 4.21.2 → 5.2.1 (major) + `@types/express` 4 → 5
1540
+ - `commander` 12.1.0 → 14.0.3 (major)
1541
+ - `dotenv` 16.x → 17.4.2 (major)
1542
+ - `sharp` 0.33.5 → 0.34.5
1543
+ - `eslint` group bumps within v10
1544
+ - **Lint hygiene** — cleared all 10 `@typescript-eslint/no-unused-vars` warnings the CI was surfacing as annotations (74 → 64 warnings). Trivial cleanup, no functional impact: dropped unused test imports (`path`, `afterEach`, `vi`, `beforeEach`, `VerifyResult`, `PipelineConfig`), removed the dead `makePipelineConfig` helper in verifiers.test.ts, renamed `step` to `_step` in `a11y-reasoner.ts:1079` (eslint config already allowed the `^_/u` prefix), and dropped unused error bindings on two `catch (e)` / `catch (err)` blocks.
1545
+
1546
+ ### Documentation
1547
+
1548
+ - SKILL.md "What's new" expanded with the 0.8.8 section.
1549
+ - README "Latest Release" updated.
1550
+ - `docs/index.html` (homepage) bumped to v0.8.8 across title, meta tags, hero badge, agent-readable summary, and footer.
1551
+
1552
+ ---
1553
+
1554
+ ## [0.8.7] - 2026-05-02 — Security hardening: direct-tool safety gate, version-string single-source, tooling bumps
1555
+
1556
+ A security-focused patch release. The headline is a real behaviour change: every direct tool invocation — both the REST `/execute/:name` endpoint and the MCP `callTool` handler — now passes through a shared safety gate, so direct callers can no longer bypass the checks the agent loop already enforced. Plus: the version string is now single-sourced (no more `0.7.2` showing up in MCP metadata three releases late), and the dev tooling is current (TypeScript 6.0, ESLint 10).
1557
+
1558
+ ### Fixed
1559
+
1560
+ - **Direct tool execution bypassed safety checks.** REST `/execute/:name` and MCP `callTool` invoked tools without consulting the same gate the agent loop used. A misconfigured client could reach `confirm`-tier or blocked tools without the expected guardrails. New `src/tools/safety-gate.ts` (~40 lines) wraps every direct invocation; both entry points (`src/index.ts`, `src/tool-server.ts`) now route through it. Read-only, blocked, and confirm-tier decisions resolve identically across REST, MCP, and the agent loop. Test coverage in `src/__tests__/tool-safety-gate.test.ts`.
1561
+ - **Accessibility / window / clipboard reads now use `PlatformAdapter` consistently.** `src/tools/a11y.ts` previously called underlying OS APIs directly; aligns with the rest of the codebase by routing through the shared adapter, with a legacy fallback if the adapter is unavailable.
1562
+
1563
+ ### Changed
1564
+
1565
+ - **Version string is single-sourced from `package.json`.** `src/index.ts` (the `McpServer` constructor) and `src/onboarding.ts` (the consent file) each kept their own hardcoded copy of the version. Both fell out of sync — `index.ts` shipped `0.7.2` in the MCP handshake for several releases until v0.8.6 caught it manually. Both now import `VERSION` from `src/version.ts`, which already reads `package.json` at runtime. Adds `tests/version-drift.test.ts`: scans `src/**/*.ts` for any literal of the current `package.json` version and fails the build if found anywhere except `src/version.ts`. Future bumps only need to touch `package.json`.
1566
+ - **TypeScript 5.9.3 → 6.0.3** (devDependency). Major compiler bump. `tsconfig.json` adds `"ignoreDeprecations": "6.0"` to silence the new `moduleResolution: "node"` deprecation without changing runtime behaviour — the project remains CommonJS with the same module resolution semantics. A proper migration to `nodenext` can land in a later release.
1567
+ - **ESLint 9 → 10 + typescript-eslint plugins** (devDependency). Major linter bump. ESLint 10 promotes `no-useless-assignment` and `preserve-caught-error` into the recommended ruleset. Resolved all 8 new errors as actual code fixes rather than rule downgrades:
1568
+ - `cdp-driver.ts`: removed useless `let selector = ''` initialiser (all branches assign before use).
1569
+ - `doctor.ts`, `ocr-reasoner.ts`: scoped `smokeOk` and `guidePrompt` as `const` inside their try blocks (they were never read outside).
1570
+ - `compound.ts`: removed useless `= []` initialiser; the catch always returns, so TypeScript still considers `points` definitely assigned.
1571
+ - `smart-interaction.ts`: eliminated the `currentA11yState` tracking variable entirely — it was always equal to the fresh `a11yContext` read at the top of each ReAct loop iteration. Three useless-assignment sites disappear by replacing references with `a11yContext` directly.
1572
+ - `ui-driver.ts`: rethrown `SyntaxError` now includes `{ cause: err }`.
1573
+ - **Routine dependency hygiene.** Playwright `1.58.2 → 1.59.1`, ws `8.19.0 → 8.20.0`, postcss + `@types/*` group bumps, GitHub Actions `setup-node@v4 → v6`, `checkout@v4 → v6`.
1574
+
1575
+ ### Documentation
1576
+
1577
+ - SKILL.md "What's new" expanded with the 0.8.7 section. README "Latest Release" updated.
1578
+ - `docs/index.html` (homepage) bumped to v0.8.7 across title, meta tags, hero badge, and footer.
1579
+
1580
+ ---
1581
+
1582
+ ## [0.8.6] - 2026-05-01 — Polish release: MCP server version, homepage simplification, repo hygiene
1583
+
1584
+ A short follow-up to 0.8.5 that closes one user-visible bug carried over from the v0.7.x line and a handful of professionalism gaps surfaced in a pre-release audit. No schema changes, no behavior changes for agents — purely metadata, docs, and the public landing page.
1585
+
1586
+ ### Fixed
1587
+
1588
+ - **`McpServer` advertised the wrong version.** `src/index.ts` constructed the MCP server with `version: '0.7.2'` and `src/onboarding.ts` wrote the same string into the consent file — both untouched since the 0.7.x line. MCP clients (Claude Code, Cursor, Windsurf, Zed) display this string in their server metadata, so users on v0.8.5 saw "clawdcursor v0.7.2" in their host UI. Both sites now read `0.8.6`. `src/index.ts:1054`, `src/onboarding.ts:31`.
1589
+
1590
+ ### Added
1591
+
1592
+ - **`SECURITY.md`** — private vulnerability reporting path for a tool that runs with full Accessibility + Screen Recording permissions on the user's desktop. Points reporters at GitHub's private vulnerability reporting flow plus a mailbox fallback. Should have existed since v0.7.0; closing the gap now.
1593
+
1594
+ ### Changed
1595
+
1596
+ - **Homepage simplified.** `docs/index.html` lost ~80 lines of decorative weight without losing information:
1597
+ - Removed the page-wide green AI-cursor mouse-follower (CSS + HTML + JS, ~60 lines). Cute, but contradicts the "serious skill, not a demo" framing.
1598
+ - Hero badge collapsed from a 4-fact release-summary string to a one-line `v0.8.6 — latest stable`. Release detail belongs in CHANGELOG, not the hero.
1599
+ - Stats grid pruned from 4 tiles to 3 — the `any AI Model` tile was filler.
1600
+ - "CLI Agent" mode card relabeled `CLI — testing only` to match the README's skill-first reframe (in 0.8.4) where `start` is explicitly the testing/troubleshooting path, not a recommended runtime mode.
1601
+ - The `clawdcursor doctor` post-install comment used to read `# verify install + wire into your agent (MCP)`; `doctor` does not write to host config files. Corrected to `# verify install — then add the MCP block to your agent host config`.
1602
+ - **`LICENSE`** copyright year `2026` → `2025-2026`. The earliest CHANGELOG entry is March 2025.
1603
+
1604
+ ### Removed
1605
+
1606
+ - **`V0.7.5-SPEC.md`** at the repo root — describes the v0.7.5 OCR+a11y parallel-merge architecture, which was superseded by the unified blind-first pipeline in v0.8.1/v0.8.2. Five releases of stale content with zero inbound references. Preserved in git history.
1607
+ - **`docs/v0.7.0/`, `docs/v0.7.2/`, `docs/v0.7.12/`, `docs/v0.7.14/`** — pinned-version landing pages for releases that were never published as GitHub Releases. Not linked from the live homepage or README. `docs/v0.7.5/` kept (only pre-0.8 release with a published GitHub Release).
1608
+
1609
+ ### Documentation
1610
+
1611
+ - **GitHub Releases backfilled.** Tags v0.8.0, v0.8.2, v0.8.3, v0.8.4, v0.8.5 had existed for weeks without a corresponding Releases entry — only v0.7.5 was published. All five 0.8.x releases now have a Releases entry sourced from this CHANGELOG, with v0.8.5 marked latest until v0.8.6 ships.
1612
+ - SKILL.md "What's new" expanded to cover 0.8.6.
1613
+
1614
+ ---
1615
+
1616
+ ## [0.8.5] - 2026-04-30 — Review-fix maintenance + compact-tool keyboard fix
1617
+
1618
+ Two remote review passes (six findings + ten findings) on the v0.8.4 docs uncovered one real behavior bug, several factually wrong install instructions, and a long tail of documentation drift that had built up across SKILL.md, README, docs/index.html, and source comments. This release closes all of it. 429/430 tests still pass; granular schema snapshot unchanged.
1619
+
1620
+ ### Fixed
1621
+
1622
+ - **`computer({"action":"key","combo":"..."})` now works.** The compound `key` / `key_press` / `key_down` / `key_up` actions had no `argRemap`, so the schema exposed `key` (not `combo`). REST rejected `combo` as an unknown parameter; MCP silently dropped it and the granular handler crashed with `(undefined).toLowerCase()`. Implemented the remap that `compact.ts:46-47` had documented as the canonical example since v0.8.1 — `argRemap: { combo: 'key' }` on all four keyboard actions. Granular schema is unaffected; the `key` granular tool still takes `key`. `src/tools/compact.ts`.
1623
+ - **Stale "72 granular tools" count** in user-visible places — `clawdcursor mcp --help`, the markdown returned by `GET /docs`, plus four internal source comments. CHANGELOG v0.8.2 established 74 (72 + 2 Electron-bridge tools) as canonical; the agent-facing surfaces are now consistent. `src/index.ts`, `src/tool-server.ts`, `src/tools/compact.ts`, `src/tools/index.ts`.
1624
+
1625
+ ### Documentation
1626
+
1627
+ - **README installer claims rewritten.** The previous wording falsely claimed the installer (1) drops files into `~/.clawdcursor`, (2) registers an MCP server in `~/.claude/settings.json`, and (3) copies SKILL.md into every detected agent's skill directory. Verified against `docs/install.sh` and `docs/install.ps1`: the installer only clones to `~/clawdcursor` (no dot), runs `npm install + build`, and `npm link`s the global shim. The dotted `~/.clawdcursor/` directory holds runtime state only. Wiring the skill into Claude Code now correctly says the JSON block is required, not optional.
1628
+ - **Compact-action surface corrections.** The README's compact-tool table used invented action names — `accessibility.read_screen` (actual: `read_tree`), `accessibility.get_focused` (`focused`), `window.set_state`/`set_bounds`/`get_active` (none exist), `system.open_app` (lives on `window`), `system.read_clipboard` (`clipboard_read`), `browser.navigate` (lives on `window`), and the entire `task` action enum (`task` has no enum — just `{instruction}`). All rewritten against `src/tools/compact.ts`. Marquee example also fixed to use real calls.
1629
+ - **Linux accessibility package.** Was `at-spi2-core` + `python3-gi`; the actual missing package on a fresh Ubuntu install is `gir1.2-atspi-2.0` (the AT-SPI typelib that `python3-gi` consumes). Brought into line with SKILL.md, the probe script's hint, and the platform adapter docstring.
1630
+ - **Compact-action tables now non-exhaustive by default.** Added a "Most-used actions" header + caveat pointing to `GET /tools?mode=compact`, and filled in the high-value entries that had been silently dropped (`accessibility.list_children`, `browser.page_context`, `window.list_displays` / `screen_size` / `switch_tab`, `computer.scroll_horizontal` / `triple_click`).
1631
+ - **`clawdcursor dashboard` removed** from the README CLI block — that command never existed; the dashboard is reachable at `http://127.0.0.1:3847` while `serve` or `start` is running. `status` and `consent` subcommands added to the CLI block since they were referenced in the Options block but never introduced.
1632
+ - **`--compact` / `--accept` flag scopes corrected.** README claimed `--compact` works on `serve`; it's mcp-only (`serve` uses `?mode=compact` on `GET /tools`). README claimed `--accept` is universal; it lives on `start` and `consent` (`serve` uses `--skip-consent`).
1633
+ - **"Anthropic Agent SDK" → "Claude Agent SDK"** (the official product name) across README.
1634
+ - **`invoke_element` recategorized** from "Window / App" to "Accessibility" in the README — matches its registration in `src/tools/a11y_depth.ts` and the SKILL.md taxonomy.
1635
+ - **`docs/index.html` install snippets** no longer push `clawdcursor start` as the canonical post-install step (contradicts the new "skill, not application" framing). Replaced with `clawdcursor doctor` (verify-the-install) and a footer note that `start` is testing-only. Hero badge CVE list now includes `follow-redirects`.
1636
+ - **SKILL.md `/health` example** now uses `<x.y.z>` placeholder instead of a hard-coded version that drifts every release. "What's new" section expanded to cover 0.8.4 + 0.8.3 + 0.8.2.
1637
+ - **Cost-tier ladder + "no task is impossible" callout** added to SKILL.md (lines 38, 108-118). Sets the default agent disposition: GUI + mouse + keyboard = everything you need; start at T1 (structured a11y), escalate only when the current tier fails.
1638
+ - **Skill-first README rewrite.** The headline now reads "The skill that gives any AI agent eyes, hands, and a keyboard on a real desktop." `start` / `task` are demoted to a "Testing and Troubleshooting" appendix with explicit guidance that agents should not invoke them — they go through MCP or the REST surface. Replaces the earlier "OS-level desktop automation server" framing.
1639
+ - **Stale tagline cleanup.** Removed "ears" (no audio capture exists in `src/`) from `package.json` description, SKILL.md frontmatter, and `docs/index.html` meta tags + agent-readable summary. Aligned with the README's existing "eyes, hands, and a keyboard" wording.
1640
+ - **Pre-existing fix while in the area:** dropped the blocking `clawdcursor serve` step from `metadata.openclaw.install` in SKILL.md. `serve` is a foreground HTTP server with no auto-exit; using it as a sequential install step would either hang the installer or leave a zombie daemon — directly contradicts the "nothing runs in the foreground" framing.
1641
+
1642
+ ### Verified, not changed
1643
+
1644
+ - **Cmd+Q is blocked.** Review claimed Cmd+Q is not actually blocked by the safety layer. Verified against `src/pipeline/playbooks/keys-blocklist.ts:24` + `src/pipeline/safety/layer.ts:325-328`: it IS blocked through the SafetyLayer chokepoint via both `combo` and `key` arg paths. README is correct; no change needed.
1645
+
1646
+ ---
1647
+
1648
+ ## [0.8.4] - 2026-04-21 — Security maintenance + README rewrite
1649
+
1650
+ Dependency audit release. No functional changes, no schema changes, 429/430 tests still pass.
1651
+
1652
+ ### Security
1653
+
1654
+ Patched every fixable advisory in the dependency tree (5 of 12 surfaced by `npm audit`). The remaining 7 moderate alerts all chain through `jimp → @nut-tree-fork/nut-js` and have no upstream fix yet; tracked for a follow-up once nut-js releases a jimp upgrade.
1655
+
1656
+ - **`vite`** → 7.3.2+ · **High** · path traversal in optimized-deps `.map` handling ([GHSA-4w7w-66w2-5vf9](https://github.com/advisories/GHSA-4w7w-66w2-5vf9)), `server.fs.deny` bypass via query strings ([GHSA-v2wj-q39q-566r](https://github.com/advisories/GHSA-v2wj-q39q-566r)), arbitrary file read via dev-server WebSocket ([GHSA-p9ff-h696-f583](https://github.com/advisories/GHSA-p9ff-h696-f583)).
1657
+ - **`path-to-regexp`** → 0.1.13+ · **High** · ReDoS via multiple route parameters ([GHSA-37ch-88jc-xwx2](https://github.com/advisories/GHSA-37ch-88jc-xwx2)).
1658
+ - **`picomatch`** → 4.0.4+ · **High** · method injection in POSIX character classes + ReDoS via extglob quantifiers ([GHSA-3v7f-55p6-f55p](https://github.com/advisories/GHSA-3v7f-55p6-f55p), [GHSA-c2c7-rcm5-vvqj](https://github.com/advisories/GHSA-c2c7-rcm5-vvqj)).
1659
+ - **`hono`** → 4.12.14+ · Moderate · HTML injection in `hono/jsx` SSR via unsafe attribute names ([GHSA-458j-xx4x-4375](https://github.com/advisories/GHSA-458j-xx4x-4375)).
1660
+ - **`follow-redirects`** → 1.15.12+ · Moderate · custom auth headers leaked across cross-domain redirects ([GHSA-r4q5-vmmm-2653](https://github.com/advisories/GHSA-r4q5-vmmm-2653)).
1661
+
1662
+ ### Changed
1663
+
1664
+ - **README rewrite.** Removed stale "What's New in v0.8.0 — V2 Architecture" headliner (v0.8.0's V2-vs-legacy split was unified in v0.8.2 — no opt-in flag, no two pipelines). Pipeline section now reflects the unified blind → hybrid → vision router, the `safety.evaluate()` chokepoint, ground-truth verification, and the v0.8.3 runaway guard. Tool surface reorganized around the 6-tool compact catalog and the 74-tool granular catalog. Tone tightened; marketing phrasing trimmed.
1665
+
1666
+ ---
1667
+
1668
+ ## [0.8.3] - 2026-04-19 — Hotfix: "Outlook keeps opening" + runaway guard
1669
+
1670
+ User reported Outlook launching repeatedly during a test. Root-cause diagnosis traced to three compounding failures: (1) `PlatformAdapter.openApp` spawned a new instance even when the app was already running, (2) the escalation ladder (router → blind → hybrid → vision) re-ran `open_app` at each rung because earlier rungs couldn't verify success through New Outlook's sparse WebView2 accessibility tree, (3) `clawdcursor stop` only killed the `start` process on port 3847, missing `serve` (different port / same port different process) and `mcp` (stdio, no port) entirely. A stale `serve` kept receiving MCP traffic after the user thought they'd stopped everything.
1671
+
1672
+ ### Fixed
1673
+
1674
+ - **`openApp` / `launchApp` idempotency** (Windows + macOS + Linux). When the target app already has a visible window AND the caller didn't set `alwaysNewInstance: true` AND no `url` is passed, the adapter now focuses the existing window and returns its pid instead of spawning another instance. Match policy: case-insensitive exact processName → processName substring → title substring → UWP AppId tail. Closes the "N windows of Outlook stacking up" class of bug under any retry loop. `src/v2/platform/{windows,macos,linux}.ts`.
1675
+ - **Agent runaway guard** — if the agent calls the same tool + identical args ≥ 3 times within the last 6 turns, the loop exits with `give_up` and a targeted message suggesting `detect_webview_apps` when the target is likely Electron/WebView2. Prevents the generalized "retry-loop-because-a11y-is-opaque" anti-pattern. `src/pipeline/agent/agent.ts`.
1676
+ - **`clawdcursor stop` now sweeps all modes.** After the graceful `/stop` on port 3847, iterates every pidfile in `~/.clawdcursor/*.pid`, SIGTERMs any live pid, SIGKILLs after 500ms if still running, and unlinks the pidfile. Catches `mcp` (stdio-only), zombie `serve`, and any start/serve on a non-default port. `src/index.ts`.
1677
+
1678
+ ### Notes
1679
+
1680
+ - Stale-pidfile cleanup at startup was already correct via `claimPidFile` (checks `isProcessAlive(existingPid)` and overwrites when dead) — no code change needed there; the issue was exclusively `stop`.
1681
+ - Tests: 429 / 430 pass (1 skipped, same as 0.8.2). No schema snapshot change — these are behavioral fixes, not catalog changes.
1682
+
1683
+ ---
1684
+
1685
+ ## [0.8.2] - 2026-04-19 — Session reliability, force-focus, Electron bridge
1686
+
1687
+ First-time-user review surfaced six concrete pain points. This release fixes every one.
1688
+
1689
+ ### Fixed
1690
+
1691
+ - **Silent 401 mid-session** (the session-killer). Previous versions compared the incoming Bearer token against an in-memory `SERVER_TOKEN` only. A second clawdcursor process (stale pidfile takeover, or a concurrent mode) rewrote the token FILE without updating the first server's in-memory copy — clients reading the file silently lost auth. `/health` kept returning 200 so the failure was invisible. Fix: `requireAuth` now accepts EITHER the in-memory token OR the current on-disk token (mtime-cached, ~free). Drift is logged once with a recovery hint. `src/server.ts`.
1692
+ - **`focus_window` force-to-front on Windows.** Previous implementation called `SetForegroundWindow` which the OS blocks when the caller isn't the current foreground process. New implementation uses the full sequence: `ShowWindow(SW_RESTORE)` → topmost-toggle → `AttachThreadInput` with the current foreground thread → `AllowSetForegroundWindow(ASFW_ANY)` → `BringWindowToTop` → `SetForegroundWindow`, with an Alt-key synthetic fallback. Raises any window through Windows' foreground lock. `scripts/ps-bridge.ps1`.
1693
+ - **Richer validation errors.** REST `/execute` rejections now carry the full expected tool signature. A missing param returns `Missing required parameter "target". Expected smart_click(target: string, processId?: number).` — agents no longer have to roundtrip to `/docs`. `src/tool-server.ts`.
1694
+
1695
+ ### Added
1696
+
1697
+ - **Electron / WebView2 detection.** New MCP tools `detect_webview_apps` and `relaunch_with_cdp` (also exposed via compact `system({"action":"detect_webview"})` / `system({"action":"relaunch_with_cdp"})`). Recognises olk (New Outlook), Teams, Discord, Slack, VS Code, GitHub Desktop, Notion, Obsidian, Spotify. When detected, probes ports 9222/9223/9229/8315 for a live CDP endpoint; if found, tells the agent to attach via `browser({"action":"connect"})`. If not, shows the exact relaunch command (e.g. `discord --remote-debugging-port=9222`) so CDP can be enabled and the sparse UIA tree bypassed entirely. `src/tools/electron_bridge.ts`.
1698
+ - **`drag_path` documentation clarity.** Existing `mouse_drag_stepped` / compact `computer({"action":"drag_path","path":"[...]"})` now explicitly documented for freehand curve drawing (Paint, Figma, canvas apps). SKILL.md "Quick reference" covers when to use `drag_path` vs `drag`.
1699
+
1700
+ ### Changed
1701
+
1702
+ - **SKILL.md pushes compact mode harder.** Top of doc now carries a directive callout: *"If you are an LLM reading this: YOU SHOULD BE USING COMPACT MODE."* with MCP config + REST URL. Granular stays available but is explicitly labeled the power-user / larger-prompt option.
1703
+ - **SKILL.md web-app keyboard warning.** Web-wrapped apps (Outlook, Teams, Gmail) treat `Escape` as "close dialog/modal" — sometimes closing the compose window. Documented: do not use Escape to dismiss autocompletes in web apps; use arrow keys + Enter or click-away.
1704
+ - **Error-recovery table** expanded with Electron-vs-true-canvas split, v0.8.2 auth recovery, v0.8.2 force-focus note, and the `drag_path` vs `drag` distinction.
1705
+
1706
+ ### Tests
1707
+
1708
+ - 429 / 430 passing (one skipped, same as 0.8.0).
1709
+ - Schema snapshot regenerated → 74 granular tools (72 + 2 Electron bridge).
1710
+ - Live smoke: token auth survives a second `clawdcursor serve`; `focus_window` raises Paint through a full-screen window; `detect_webview_apps` correctly flags Outlook / Teams / VS Code when any are open.
1711
+
1712
+ ### Consolidates v0.8.1 (never tagged)
1713
+
1714
+ 0.8.1-alpha.0 through -alpha.N shipped unified-pipeline + compact-MCP + Linux AT-SPI + Wayland routing on the feature branch. They roll into 0.8.2 as a single stable release. See the v0.8.1-alpha tag range in the git history for per-tranche detail; headline features:
1715
+
1716
+ - **Unified blind/hybrid/vision agent** — one loop, three modes. Replaces the v0.8.0 split `text-agent` + `vision-agent` with a single harness using native `tool_use` (Anthropic) / `tool_calls` (OpenAI) / prose-JSON fallback.
1717
+ - **Compact MCP surface** — 6 compound tools (`computer`, `accessibility`, `window`, `system`, `browser`, `task`) that collapse the full capability into ~1,500 tokens of catalog. Anthropic-Computer-Use shape extended across the whole product. `clawdcursor mcp --compact` or `GET /tools?mode=compact`.
1718
+ - **PlatformAdapter widened** — `mouseDown/Up`, `keyDown/Up`, `setWindowState`, `setWindowBounds`, `listDisplays`, `waitForElement`, widened `InvokeAction` (`expand`/`collapse`/`toggle`/`select`/`get-value`), richer `UiElement` state flags.
1719
+ - **Linux AT-SPI bridge** — read-only first pass via `python3-gi` + `gir1.2-atspi-2.0`. Linux a11y methods (`getUiTree`, `findElements`, `getFocusedElement`, `waitForElement`) now return real data on boxes where the bridge dependencies are present. `invokeElement` still stubbed — tracked for a follow-up pass.
1720
+ - **Linux Wayland input routing** — `ydotool` (mouse + keyboard) or `wtype` (keyboard fallback) detected at init. X11 path unchanged; Wayland no longer silently mis-fires through nut-js.
1721
+ - **Per-capability palettes + compound vision tools** — text-agent turns now see a 6-10 tool scoped palette based on the subtask's capability (`app_launch` / `text_input` / `navigation` / `form_fill` / `spatial` / `file_ops` / `window_mgmt` / `general`). Vision-agent turns see 3 compound `mouse` / `keyboard` / `window` tools with action enums. ~12× fewer catalog tokens per turn.
1722
+ - **Pretty TTY logs with HH:MM:SS timestamps** — layer-tagged (`[router]`, `[blind]`, `[vision]`, `[safety]`, etc.), no per-line repetition, `CLAWD_LOG=pretty` default on TTY.
1723
+ - **SKILL.md rewrite** — reviewed by a Sonnet subagent against legacy v0.6.3/v0.7.14 tone, verified model-agnostic + OS-agnostic, restored "USE AS A FALLBACK" + "IMPORTANT — READ THIS BEFORE ANYTHING ELSE" directive callouts and Sensitive App Policy.
1724
+
1725
+ ---
1726
+
1727
+ ## [0.8.0] - 2026-04-16 — V2 Architecture (opt-in)
1728
+
1729
+ A ground-up reimagining of the internal pipeline. Opt in with `clawdcursor start --v2`. The legacy pipeline is unchanged and remains the default.
1730
+
1731
+ ### Added
1732
+
1733
+ - **`--v2` flag on `clawdcursor start`** — activates the new 3-layer architecture: Router → VisionAgent → Verifier. No effect on MCP, `serve`, or legacy `start`.
1734
+ - **`src/v2/platform/`** — platform abstraction. Single `PlatformAdapter` interface with `macos.ts`, `windows.ts`, `linux.ts` implementations. Replaces 142+ scattered `if (process.platform === 'darwin')` branches across 34 files. Business logic no longer sees `process.platform`. Adding a new OS = one file.
1735
+ - **`src/v2/verifier/`** — `GroundTruthVerifier`. Six independent signals decide whether a task actually completed: pixel diff, window change, focus change, OCR delta, task-specific assertions (`send_email`, `navigate_url`, `open_app`, `type_text`, `search`, `compose_message`, `create_file`), and anti-patterns (error dialogs, "cannot send", "draft saved", invalid recipient, auth failed). Weighted voting with hard-fail rules on anti-patterns. Cannot be fooled by LLM self-reported "done".
1736
+ - **`src/v2/agent/`** — `VisionAgent`: a single vision-first tool-use loop. 16 tools (`screenshot`, `read_screen`, `list_windows`, `click`, `drag`, `scroll`, `type`, `key`, `invoke_element`, `set_field_value`, `open_app`, `focus_window`, `read_clipboard`, `write_clipboard`, `wait`, `done`). 6-rule system prompt (down from 36). Model-agnostic via existing `callVisionLLM`.
1737
+ - **`src/v2/orchestrator.ts`** — `PipelineV2` wires Router → VisionAgent → Verifier with before/after state capture.
1738
+ - **Hardened JSON parser** — tolerates trailing braces, markdown code fences, and other common LLM malformations. Balanced-brace extraction as fallback.
1739
+
1740
+ ### Fixed
1741
+
1742
+ - **False positives** — legacy pipeline reports `UNVERIFIED_SUCCESS` when the agent claims "done" but the screen didn't change. V2 verifier catches this class: in a live email-send test the agent said "Email sent" but a "Cannot send" dialog was on screen. V2 correctly rejected the claim. (Legacy still does what it does; this fix only applies when `--v2` is set.)
1743
+
1744
+ ### Testing
1745
+
1746
+ Smoke-tested on macOS with Anthropic Claude Haiku (text) + Sonnet (vision):
1747
+
1748
+ | Task | Time | Verdict |
1749
+ |------|------|---------|
1750
+ | Open TextEdit and type | 30s | ✅ (4/6 signals) |
1751
+ | Calculator: 47+53=100 | 65s | ✅ (5/6 signals, zero parse errors) |
1752
+ | Safari → github.com | 45s | ✅ (6/6 signals) |
1753
+ | Notes: create note | 182s | ✅ (6/6 signals) |
1754
+ | Email send (failing server) | 86s | ❌ **Correctly rejected** — legacy would have reported success |
1755
+
1756
+ ### Platform Safety
1757
+
1758
+ No legacy code modified. Windows, Linux, and MCP paths untouched. v2 code is entirely under `src/v2/`.
1759
+
1760
+ ## [0.7.14] - 2026-04-13 — Full macOS Keyboard Automation + Platform-Aware Pipeline
1761
+
1762
+ ### Fixed
1763
+ - **macOS keystrokes silently dropped** — root cause: `CGEvent.post()` from the Swift helper is blocked by macOS TCC when the helper is spawned as a child of Node.js. `keyPress()` and `typeText()` on macOS now route through `osascript` + System Events (the Apple-sanctioned method). All keyboard shortcuts (Cmd+V, Cmd+N, Shift+Cmd+D, etc.) now work correctly.
1764
+ - **Single-char keys losing modifiers** — `keycodeForCharacter()` lookup added to `ClawdCursorHelper`; modifiers are no longer discarded for Cmd+letter combos.
1765
+ - **`asDouble()` coercion** — click/drag coordinates sent as integers (common from some LLMs) no longer fail with a type mismatch in the Swift helper.
1766
+ - **`keycodeForCharacter` fallback** — now returns an error for unmapped characters instead of silently falling back to the 'v' keycode.
1767
+ - **Permission check inconsistency** — `doctor`, `status`, and `readiness.ts` all now query the same canonical path: Host `/status` → `permission-check` binary → direct fallback. No more false "granted" reports.
1768
+ - **Screenshot capture CPU spin** — replaced `CGWindowListCreateImage` (triggers ReplayKit CPU spin bug on macOS 14+) with a delegated `screenshot-helper` subprocess.
1769
+ - **A11y false positive** — `isShellAvailable()` now tests actual window access (`p.windows.length`) instead of `processes.length`, which worked without Accessibility permission.
1770
+ - **Node.js v25 crash** — `EINVAL`/`setTypeOfService` socket error from undici's internal QoS call is now caught and suppressed (non-fatal).
1771
+ - **Dock click zone** — reduced from 60px to 30px on macOS (Dock is thinner than the Windows taskbar).
1772
+ - **Browser URL bar shortcut** — `Cmd+L` used on macOS (was `Ctrl+L`, which does nothing in macOS browsers).
1773
+
1774
+ ### Added
1775
+ - **`macMailEmailFlow`** — deterministic email flow for macOS Mail.app (Cmd+N, Tab to subject/body, Cmd+Shift+D to send).
1776
+ - **`clawdcursor grant` command** — triggers macOS system permission dialogs directly from the CLI.
1777
+ - **115 Apple shortcuts** — Mail, Safari, Notes, Messages, Terminal added to the shortcut database.
1778
+ - **`scripts/test-macos-fixes.sh`** — one-shot E2E verification script: rebuild, binary check, permission consistency, screenshot capture, doctor cross-check.
1779
+ - **`--request-screen-recording` flag** on `permission-check` binary — optional TCC dialog trigger for Screen Recording.
1780
+ - **`processPath` + `bundleId`** in all permission check responses — aids TCC debugging.
1781
+ - **30s TTL cache** on A11y shell availability — permission grants mid-session are now detected without restart.
1782
+ - **macOS native binary verification** in `scripts/verify-install.js` — warns on missing binaries at `npm install` time.
1783
+ - **`setup` script auto-builds** native binaries on macOS (inside `npm run setup`).
1784
+
1785
+ ### Changed
1786
+ - **`build.sh`** — marked executable in git, fails fast on missing binaries (was silently warning), better error guidance.
1787
+ - **Installer** — verifies all 4 required binaries (not just `ClawdCursorHost`), uses `bash ./build.sh` for portability.
1788
+ - **`doctor.ts`** — permission check unified via `native-helper` module; triggers system permission dialogs if denied.
1789
+ - **Email flow keyboard shortcuts** — platform-aware: `Ctrl+Enter` → `Shift+Cmd+D` on macOS, `Ctrl+H` → `Cmd+Option+F` for Find & Replace.
1790
+ - **`sharp`** bumped `^0.33.0` → `^0.33.5`.
1791
+
1792
+ ### Platform Safety
1793
+ No Windows or Linux code paths affected. All macOS changes are gated behind `IS_MAC` / `process.platform === 'darwin'` / `isMacOS()`.
1794
+
1795
+ ## [0.7.13] - 2026-04-10 — Unified Permission Checks + Screenshot Helper
1796
+
1797
+ ### Fixed
1798
+ - **Permission check fragmentation** — doctor, status, and readiness each used different permission APIs, producing contradictory results. All now route through `ClawdCursorHost /status` → `permission-check` binary → direct `AXIsProcessTrusted` fallback.
1799
+ - **Screenshot CPU spin** — delegated `takeScreenshot()` to `screenshot-helper` subprocess, eliminating the ReplayKit CPU spike on macOS 14+.
1800
+ - **Installer binary verification** — now checks all 4 required binaries (`ClawdCursorHost`, `clawdcursor-helper`, `screenshot-helper`, `permission-check`) instead of just `ClawdCursorHost`.
1801
+ - **`build.sh` silent failures** — `swift build` errors now fail the build immediately with actionable guidance.
1802
+
1803
+ ### Added
1804
+ - **`clawdcursor grant` command** — triggers macOS system permission dialogs for Accessibility and Screen Recording.
1805
+ - **`processPath` + `bundleId`** in permission check responses for TCC debugging.
1806
+ - **`--request-screen-recording` flag** on `permission-check` binary.
1807
+
1808
+ ## [0.7.12] - 2026-04-09 — Comprehensive macOS TCC Fix
1809
+
1810
+ ### Fixed
1811
+ - **Bash pipeline bug** — `set -o pipefail` added; build failures now properly detected (was silently passing due to pipeline exit status bug)
1812
+ - **Ad-hoc signing by default** — build.sh now always signs the app (required for TCC on macOS 26+ Tahoe where unsigned binaries don't appear in privacy settings)
1813
+ - **Build error capture** — uses temp file instead of pipe to properly capture exit status
1814
+ - **TCC permission check** — runs permission-check after build to show current accessibility/screen recording status
1815
+
1816
+ ### Changed
1817
+ - **build.sh rewritten** — cleaner structure, ad-hoc signing is default (not optional), signature verification added
1818
+ - **Codesign uses --deep** — ensures all nested binaries are signed
1819
+ - **Installer shows TCC status** — tells user exactly which permissions need to be granted and where
1820
+
1821
+ ### Technical Details
1822
+ The core issue was TCC (Transparency, Consent, and Control) on macOS binds permissions to the code signing identity. Without signing:
1823
+ - On macOS 26+ (Tahoe), unsigned binaries don't appear in System Settings privacy panels at all
1824
+ - Users saw "ClawdCursorHost binary not found" errors even though install appeared to succeed
1825
+
1826
+ Reference: mediar-ai/mcp-server-macos-use for TCC permission handling patterns.
1827
+
1828
+ ## [0.7.11] - 2026-04-09 — macOS Installer Fix
1829
+
1830
+ ### Fixed
1831
+ - **macOS installer now fails loudly if native host build fails** — was silently swallowing build errors and claiming "optional fallback" that doesn't exist
1832
+ - **Added verification step** — installer explicitly checks ClawdCursorHost binary exists before declaring success
1833
+ - **Show build output** — Swift build errors are now visible instead of redirected to /dev/null
1834
+ - **Clear error messages** — tells users exactly what went wrong and how to fix it (xcode-select --install, manual rebuild, etc.)
1835
+
1836
+ ### Changed
1837
+ - macOS native host is now correctly marked as REQUIRED, not optional
1838
+ - Installer exits with error code 1 if native build fails on macOS
1839
+
1840
+ ## [0.7.10] - 2026-04-08 — Guided Setup Flow
1841
+
1842
+ ### Changed
1843
+ - **Installer shows next steps** — after install, displays clear guidance: `clawdcursor doctor` → `clawdcursor start`
1844
+ - **Doctor shows run options** — after passing all checks, shows both `start` (full agent) and `serve` (tools-only) modes
1845
+ - **Consent shows next step** — after granting consent, directs users to `clawdcursor doctor`
1846
+
1847
+ ## [0.7.9] - 2026-04-08 — UX Improvements
1848
+
1849
+ ### Changed
1850
+ - **macOS permission messages** — now direct users to enable "ClawdCursor" instead of "Terminal/Node"
1851
+ - **Screen Recording path** — updated to "Screen & System Audio Recording" (macOS Sequoia naming)
1852
+
1853
+ ## [0.7.8] - 2026-04-08 — Documentation Fix
1854
+
1855
+ ### Fixed
1856
+ - **Installer comments updated** — example version references now point to v0.7.8
1857
+
1858
+ ## [0.7.7] - 2026-04-08 — Installer Fixes
1859
+
1860
+ ### Fixed
1861
+ - **Installers default to main branch** — install.sh and install.ps1 now use `main` instead of hardcoded non-existent tag
1862
+ - **macOS installer builds native helper** — install.sh now runs `./native/build.sh` on Darwin if Swift is available
1863
+ - **Version override support** — `VERSION=v0.7.7 curl ... | bash` or `$env:VERSION='v0.7.7'` to install specific release
1864
+ - **Auto-pull on update** — installers now run `git pull` after checkout to get latest changes
1865
+
1866
+ ## [0.7.6] - 2026-04-08 — macOS Native Host App
1867
+
1868
+ ### Added
1869
+ - **macOS Host App (ClawdCursorHost)** — new native Swift executable that runs as the app bundle's main process, owning all TCC permissions (Accessibility, Screen Recording) under a single app identity
1870
+ - **Localhost IPC server** — host app exposes `GET /health`, `GET /status`, `POST /rpc` on `127.0.0.1:3848` for CLI→host communication
1871
+ - **Token-based authentication** — `~/.clawdcursor/host-token` (mode 0600) secures the IPC channel
1872
+ - **Auto-launch/stop** — `clawdcursor start` ensures host is running; `clawdcursor stop` gracefully quits it
1873
+ - **New Swift helper methods** — `moveMouse`, `dragMouse`, `captureScreen` for smoother native macOS automation
1874
+ - **Menu bar presence** — host app shows 🐾 icon in menu bar for visibility
1875
+
1876
+ ### Security
1877
+ - **Localhost-only binding** — IPC server uses `NWParameters.requiredLocalEndpoint` to bind to `127.0.0.1` only, rejecting connections from other machines
1878
+ - **Token file permissions** — host-token created with mode 0600 (owner read/write only)
1879
+
1880
+ ### Changed
1881
+ - `src/native-helper.ts` — routes all macOS desktop operations through host IPC instead of direct stdio
1882
+ - `src/native-desktop.ts` — 11 platform-guarded code paths delegate to host on macOS
1883
+ - `src/index.ts` — start/stop commands manage host app lifecycle
1884
+ - `native/ClawdCursor.app/Contents/Info.plist` — bundle identifier changed to `com.clawdcursor.app`, executable to `ClawdCursorHost`
1885
+
1886
+ ### Unchanged
1887
+ - **Windows/Linux** — all macOS code behind `IS_MAC && this.helper` guards; no behavior changes on other platforms
1888
+ - **172 tests pass** — full test suite unchanged
1889
+
1890
+ ## [0.6.3] - 2026-03-01 — Universal Pipeline, Multi-App Workflows, Provider-Agnostic
1891
+
1892
+ ### Added
1893
+ - **LLM-based universal task pre-processor** — one cheap text LLM call decomposes any natural language into `{app, navigate, task, contextHints}`, replacing brittle regex parsing
1894
+ - **Multi-app workflow support** — copy/paste between apps (e.g. Wikipedia → Notepad) with 6-checkpoint tracking: first_app_focused → first_app_action_done → content_copied → second_app_opened → content_pasted → result_visible
1895
+ - **Site-specific keyboard shortcuts** — Reddit (j/k/a/c), Twitter/X (j/k/l/t/r), YouTube (Space/f/m), Gmail (j/k/e/r/c), GitHub (s/t/l), Slack (Ctrl+k), plus generic hints
1896
+ - **OS-level default browser detection** — reads Windows registry (HKCU ProgId) or macOS LaunchServices instead of hardcoded Edge/Safari
1897
+ - **3 verification retries with step log analysis** — when verification fails, builds a digest of recent actions + checkpoint status so the vision LLM can fix the specific missed step
1898
+ - **Mixed-provider pipeline support** — e.g. kimi for text, anthropic for Computer Use, with per-layer API key resolution from OpenClaw auth-profiles
1899
+ - **`ComputerUseOverrides` interface** — apiKey, model, baseUrl per-layer for mixed-provider setups
1900
+ - **`resolveProviderApiKey()` helper** — reads OpenClaw auth-profiles to find the right API key per provider
1901
+
1902
+ ### Fixed
1903
+ - **Checkpoint system overhaul** — removed auto-termination (completionRatio ≥ 0.90 early exit and isComplete() mid-loop kill), strict detection: content_pasted requires Ctrl+V, content_copied requires Ctrl+C, second_app_opened detects any window switch universally
1904
+ - **Pipeline context passing** — `priorContext[]` accumulator flows from pre-processing through to Computer Use (no more amnesia between layers)
1905
+ - **Credential resolution order** — .clawdcursor-config → auth-profiles.json → openclaw.json (with template expansion) → env vars
1906
+ - **`loadPipelineConfig()` path resolution** — checks package dir first, then cwd (fixes global npm installs)
1907
+ - **Smart Interaction model lookup** — uses `PROVIDERS` registry instead of hardcoded model/baseUrl maps; fixes stale `claude-haiku-3-5-20241022` fallback
1908
+ - **Scroll behavior** — system prompts instruct PageDown/Space instead of tiny mouse scrolls; default scroll delta 3 → 15
1909
+ - **Provider-agnostic internals** — all comments and logs say "vision LLM" instead of "Claude"
1910
+ - **Verification retry limit** — max 3 retries prevents infinite verification loops
1911
+ - **Universal checkpoint detection** — no hardcoded app lists; `detectTaskType()` uses action patterns only
1912
+
1913
+ ### Changed
1914
+ - Pipeline architecture: LLM Pre-processor → Pre-open app + navigate → L0 Browser → L1 Action Router + Shortcuts → L1.5 Smart Interaction → L2 A11y Reasoner → L3 Computer Use
1915
+ - Pre-processor prompt hardened with NEVER rules (never summarize, never drop steps) and VALIDATION RULE
1916
+ - MULTI-APP WORKFLOWS section added to both Mac and Windows Computer Use system prompts
1917
+ - Checkpoint thresholds tightened: early completion 75% → 90%, skip-verification 50% → 80%
1918
+
1919
+ ## [0.6.5] - 2026-02-28 — Checkpoint System, Task Completion Detection
1920
+
1921
+ ### Added
1922
+ - **Checkpoint-based task completion** — Computer Use tracks milestones (compose opened → fields filled → send pressed → compose closed) and stops when all checkpoints are met. No more wasted calls after successful completion.
1923
+ - **Task type detection** — auto-classifies tasks (email, form, navigate, draw, file_save) and applies appropriate checkpoint templates.
1924
+ - **Smart early termination** — when Claude says "done" and ≥75% checkpoints confirmed, accepts completion immediately.
1925
+ - **Auto-config on first run** — `clawdcursor start` auto-detects providers without needing `clawdcursor doctor`.
1926
+ - **Universal provider support** — any OpenAI-compatible endpoint works via `--base-url`.
1927
+ - **CLI model selection** — `--text-model` and `--vision-model` flags.
1928
+
1929
+ ### Fixed
1930
+ - **Email domain extraction bug** — "send to user@hotmail.com" no longer navigates to hotmail.com. Email addresses are stripped before URL matching.
1931
+ - **Verification override bug** — verification no longer contradicts confirmed checkpoint completion. Skipped when ≥50% checkpoints met.
1932
+ - **Context loss between layers** — Computer Use now receives full context of what pre-processing already did.
1933
+ - **Drawing quality** — minimum 50px drag distances enforced via system prompt.
1934
+ - **OpenClaw credential discovery** — multi-provider scan, template variable resolution, no false overrides.
1935
+ - **Pipeline gate** — Action Router always runs, shortcuts work everywhere.
1936
+
1937
+ ### Changed
1938
+ - Pipeline pre-processes "open X and Y" tasks — opens app via Action Router (free), then hands remaining task to deeper layers.
1939
+ - Smart Interaction detects visual loop tasks (draw, paint) and skips to Computer Use.
1940
+ - Computer Use system prompt includes Snap Assist handling and drawing guidelines.
1941
+
1942
+ ## [0.6.2] - 2026-02-28 — Universal Provider Support, Auto-Config
1943
+
1944
+ ### Added
1945
+ - **Auto-config on first run** — `clawdcursor start` auto-detects and configures providers without needing `clawdcursor doctor` first. Doctor is now optional for fine-tuning.
1946
+ - **Universal provider support** — any OpenAI-compatible endpoint works. Not limited to 7 hardcoded providers. Use `--base-url` + `--api-key` for custom endpoints.
1947
+ - **CLI model selection** — `--text-model` and `--vision-model` flags on start command.
1948
+ - **Dynamic OpenClaw provider mapping** — reads ALL providers from OpenClaw config, not just known ones. NVIDIA, Fireworks, Mistral, etc. work automatically.
1949
+
1950
+ ### Changed
1951
+ - `clawdcursor start` now auto-runs setup if no config exists (non-interactive)
1952
+ - Provider detection accepts any provider name, falling back to OpenAI-compatible API
1953
+ - `detectProvider()` returns 'generic' for unknown providers instead of defaulting to 'openai'
1954
+
1955
+ ## [0.6.1] - 2026-02-28 — Keyboard Shortcuts, Pipeline Fixes
1956
+
1957
+ ### Added
1958
+ - **Keyboard shortcuts registry** (`src/shortcuts.ts`) — 30+ common actions mapped to direct keystrokes. Scroll, copy, paste, undo, reddit upvote/downvote, browser shortcuts, and more. Zero LLM calls.
1959
+ - **Fuzzy shortcut matching** — "scroll the page down" fuzzy-matches to scroll-down shortcut. Context-aware matching for social media actions.
1960
+ - **Router telemetry** — Action Router now logs match type, confidence, and shortcut hits.
1961
+ - **CDP→UIDriver fallback** — Smart Interaction falls back to accessibility tree automation when browser CDP path fails.
1962
+ - **Gmail, Outlook, Hotmail** added to Browser Layer site map.
1963
+
1964
+ ### Fixed
1965
+ - **Pipeline gate bug** — Action Router was gated behind `!isBrowserTask`, causing shortcuts to be skipped for browser-context tasks (e.g., "reddit upvote" matched browser regex but should use shortcut). Action Router now always runs after Browser Layer.
1966
+ - **URL extraction false positives** — "open gmail and send email to foo@bar.com" no longer extracts `bar.com`. URL extraction now isolates the navigation clause before matching.
1967
+ - **Reliable force-stop** — `clawdcursor stop` now force-kills lingering processes via PID file.
1968
+ - **Provider label inference** — startup logs now clearly show text and vision provider names separately.
1969
+
1970
+ ### Changed
1971
+ - Pipeline order: Browser Layer (L0) → Action Router + Shortcuts (L1) → Smart Interaction (L1.5) → A11y Reasoner (L2) → Vision (L3). Action Router no longer gated.
1972
+ - `extractUrl()` uses navigation clause isolation instead of matching against full task text.
1973
+
1974
+ ## [0.6.0] - 2026-02-28 — Universal Provider Support, OpenClaw Integration
1975
+
1976
+ ### Added
1977
+ - **OpenClaw credential integration** — auto-discovers all configured providers from OpenClaw's `auth-profiles.json` and `openclaw.json`. No separate API key needed when running as an OpenClaw skill.
1978
+ - **Universal provider support** — added Groq, Together AI, DeepSeek as first-class providers with profiles, env var detection, and key prefix recognition.
1979
+ - **Auto-detection as default** — provider defaults to `auto` instead of hardcoding Anthropic. Doctor picks the best available provider automatically.
1980
+ - **Mixed provider pipelines** — use Ollama for text (free) + any cloud provider for vision (best quality). Vision credentials preserved when brain reconfigures for text.
1981
+ - **Dynamic Ollama model selection** — doctor picks the best available Ollama model instead of hardcoding `qwen2.5:7b`.
1982
+ - **Anthropic vision routing fix** — detects Anthropic vision by key prefix (`sk-ant-`) independently of the main provider field, so split-provider setups work correctly.
1983
+
1984
+ ### Changed
1985
+ - Default config no longer assumes any specific provider or model
1986
+ - Provider scan loop iterates all registered providers dynamically
1987
+ - Help text and doctor output are provider-agnostic
1988
+ - `--provider` CLI flag accepts any string (not limited to 4 providers)
1989
+ - README updated with 7-provider compatibility table
1990
+
1991
+ ### Security
1992
+ - **SKILL.md hardened** — removed aggressive autonomy language ("use without asking", "be independent")
1993
+ - **Sensitive App Policy** — agents must ask the user before accessing email, banking, messaging, or password managers
1994
+ - **Safety tiers as hard rules** — 🔴 Confirm actions must never be self-approved by agents
1995
+ - **Data flow transparency** — expanded security section documents network isolation, per-provider data flow, and Ollama = fully offline
1996
+ - **No credentials in skill directory** — OpenClaw users get auto-discovery from local config; no keys stored in skill files
1997
+
1998
+ ### Fixed
1999
+ - Vision model crash when main provider set to Ollama but vision uses Anthropic (`model not found` error)
2000
+ - Brain reconfiguration was wiping vision credentials — now preserved
2001
+
2002
+ ---
2003
+
2004
+ ## [0.5.6] - 2026-02-27 — Fluid Decomposition, Interactive Doctor, Smart Vision Fallback
2005
+
2006
+ ### Added
2007
+ - **Fluid LLM task decomposition** — decompose prompt now tells the LLM to reason about what ANY app needs. No more hardcoded examples. "Write me a sentence about dogs" generates actual content instead of typing the literal instruction.
2008
+ - **Interactive doctor onboarding** — after scanning providers, doctor shows all working TEXT and VISION LLM options with ★ recommendations. User picks by number, Enter for default. Shows GPU info (VRAM via nvidia-smi) to help decide local vs cloud.
2009
+ - **Cloud provider guidance** — doctor shows unconfigured providers with signup URLs and lets you paste an API key inline (auto-detects provider, saves to .env).
2010
+ - **Smart vision fallback for compound tasks** — when Router or Reasoner handles part of a multi-step task but fails midway, ALL remaining subtasks are bundled and handed to Computer Use (vision). Prevents false-success trapping in cheap layers.
2011
+ - **Ollama auto-detection** — brain auto-reconfigures to use local Ollama for decomposition when no cloud API key is set. `hasApiKey` now recognizes local LLMs.
2012
+ - **Compound task guard** — action router detects multi-step/compound tasks (commas, "then", "and then") and skips to deeper layers.
2013
+
2014
+ ### Fixed
2015
+ - **Case-preserving action router** — all regex matches against raw (unmodified) task text. Typed text and URLs no longer get lowercased.
2016
+ - **Flexible click matching** — `click Blank document` works without quotes (was requiring `click "Blank document"`). Single unified regex for quoted and unquoted element names.
2017
+ - **PowerShell encoding** — replaced emoji (🐾) and em dash (—) in task console title that broke on Windows PowerShell due to encoding.
2018
+ - **Stale config** — `.clawdcursor-config.json` now correctly reflects Ollama when doctor detects it (was stuck on Anthropic).
2019
+ - **Brain provider mismatch** — decomposition no longer calls Anthropic API when only Ollama is available.
2020
+
2021
+ ### Changed
2022
+ - **`npm run setup`** — new script that builds and registers `clawdcursor` as a global command via `npm link`. Works on Windows, macOS, and Linux.
2023
+ - **Stop/kill port validation** — port input is now sanitized (parseInt + range check 1-65535) to prevent command injection
2024
+ - **Kill health verification** — kill command now verifies `/health` returns a Clawd Cursor response before force-killing
2025
+ - **Install instructions updated** — README and docs now use `npm run setup`
2026
+
2027
+ ### Test Results
2028
+ | Task | Pipeline Path | Steps | LLM Calls | Time | Result |
2029
+ |------|--------------|-------|-----------|------|--------|
2030
+ | Open Notepad | Action Router | 1 | 0 | 1.5s | ✅ |
2031
+ | Open Notepad + write haiku | Router → Smart Interaction → Computer Use | 6 | 7 | 58.8s | ✅ Verified |
2032
+ | Open Google Doc in Edge + write sentence | Browser → Computer Use | 17 | 9 | 78.8s | ✅ Verified |
2033
+
2034
+ ## [0.5.5] - 2026-02-26 — Install/Uninstall, OpenClaw Auto-Registration, Doctor UX
2035
+
2036
+ ### Added
2037
+ - **`clawdcursor install`** — one command to set up API key, configure pipeline, and register as OpenClaw skill
2038
+ - **`clawdcursor uninstall`** — clean removal of all config, data, and OpenClaw skill registration
2039
+ - **Doctor auto-registers as OpenClaw skill** — symlinks into `~/.openclaw/workspace/skills/clawdcursor`
2040
+ - **Doctor quick fix commands** — shows exact commands for missing text LLM and vision LLM in summary
2041
+ - **Dashboard favorites** — star commands to save them, click to re-run, persists across server restarts
2042
+ - **Credential detection** — warns when starring tasks that contain API keys or passwords
2043
+ - **OS tabs on website** — Windows/macOS/Linux with auto-detect
2044
+ - **Post-build help message** — shows all available commands after `npm run build`
2045
+ - **Dynamic OS detection** — system prompt uses actual OS instead of hardcoded "Windows 11" (thanks @molty)
2046
+
2047
+ ### Fixed
2048
+ - **Windows skill detection** — removed `requires.bins` from SKILL.md; OpenClaw's `hasBinary()` doesn't handle Windows PATHEXT (`.exe`/`.cmd`), causing the skill to show as "missing" even when node is installed
2049
+
2050
+ ### Changed
2051
+ - **SKILL.md rewritten** — agent identity shift framing, trigger lists, CDP direct path, async polling, error recovery
2052
+ - **Security hardened** — agents cannot self-approve confirm-tier actions, autonomous use scoped to read-only
2053
+ - **Privacy language clarified** — explicit per-provider data flow
2054
+ - **Website Get Started simplified** — 3 lines, commands shown in terminal post-build
2055
+ - **Anthropic text model updated** — `claude-haiku-4-5` (was `claude-3-5-haiku-20241022`)
2056
+
2057
+ ## [0.5.4] - 2026-02-25 — SKILL.md Rewrite + Security Hardening
2058
+
2059
+ ### Changed
2060
+ - **Privacy language clarified** — explicit per-provider data flow (Ollama = fully local, cloud = data to that API only)
2061
+ - **Added homepage and source URLs** to skill metadata
2062
+ - **Removed hard-coded paths** from SKILL.md
2063
+ - **Security section expanded** — includes localhost bind verification command
2064
+ - **Security scan addressed** — all flagged documentation gaps resolved
2065
+
2066
+ ## [0.5.3] - 2026-02-25 — SKILL.md Rewrite for Agent Autonomy
2067
+
2068
+ ### Changed
2069
+ - **SKILL.md rewritten** — agents now understand they have full desktop control and stop asking users to do things they can do themselves
2070
+ - **Agent identity shift framing** — blockquote at top overrides default "I can't do desktop things" behavior
2071
+ - **"When to Use This" trigger list** — comprehensive decision framework for when to reach for Clawd Cursor
2072
+ - **Two paths documented** — REST API (port 3847) for full desktop control, CDP Direct (port 9222) for fast browser reads
2073
+ - **Async flow clarified** — concrete polling pattern agents can follow step-by-step
2074
+ - **Error recovery table** — 8 common problems with exact solutions
2075
+ - **Expanded task examples** — cross-app workflows, data extraction, verification scenarios
2076
+ - **README** — added OpenClaw Integration section
2077
+
2078
+ ## [0.5.2] - 2026-02-25 — Web Dashboard + Browser Foreground Focus
2079
+
2080
+ ### Added
2081
+ - **Web Dashboard** — full single-page UI served at `GET /` (port 3847). Task submission, real-time logs, status indicators, approve/reject for safety confirmations, kill switch. Dark theme, fully responsive, zero external dependencies.
2082
+ - **`clawdcursor dashboard`** — CLI command to open the dashboard in your default browser
2083
+ - **`clawdcursor kill`** — CLI command to send a stop signal to the running server
2084
+ - **`GET /logs`** — API endpoint returning last 200 log entries with timestamps and levels
2085
+ - **Browser foreground focus** — Playwright navigation now brings Chrome to the front via `page.bringToFront()` + OS-level window activation (PowerShell `SetForegroundWindow` on Windows, `osascript` on macOS). The AI acts like a visible cursor — you see everything it does.
2086
+ - **Console hook** — `hookConsole()` intercepts all server logs for the dashboard log feed with auto-classification (error/success/warn/info)
2087
+
2088
+ ### Changed
2089
+ - **Smart task handoff** — Browser layer no longer uses regex word lists to detect multi-step tasks. Pure navigation ("open youtube") completes in browser layer; anything more complex falls through to SmartInteraction where the LLM plans the steps. No more missed verbs.
2090
+
2091
+ ### Architecture
2092
+ ```
2093
+ Layer 0: Browser (Playwright) — navigate + foreground focus
2094
+ ↓ more than navigation? → fall through
2095
+ Layer 1: Action Router — regex patterns, zero LLM calls
2096
+ ↓ no match? → fall through
2097
+ Layer 1.5: Smart Interaction — 1 LLM call plans steps, CDP/UIDriver executes
2098
+ ↓ failed? → fall through
2099
+ Layer 2: Accessibility Reasoner — reads UI tree, cheap LLM
2100
+ ↓ failed? → fall through
2101
+ Layer 3: Screenshot + Vision — full screenshot, Computer Use API
2102
+ ```
2103
+
2104
+ ## [0.5.1] - 2026-02-23 — HD Screenshots + Focus Stability
2105
+
2106
+ ### Fixed
2107
+ - **HD screenshots** — LLM resolution increased from 1024px to 1280px (scale 2x instead of 2.5x). Claude can now reliably identify toolbar icons, buttons, and small UI elements.
2108
+ - **JPEG quality** — bumped from 55 to 65 for clearer icon identification
2109
+ - **Window focus stability** — `Win+D` minimizes all windows before task execution, preventing the Clawd terminal from stealing focus from target apps
2110
+ - **Paint drawing reliability** — pencil tool guidance in system prompt, mandatory checkpoint after tool selection
2111
+ - **Stale file cleanup** — restored `get-windows.ps1` shim (still referenced by accessibility.ts), removed dead `setup.ps1` and `get-ui-tree.ps1`
2112
+
2113
+ ### Performance (Paint stickman benchmark)
2114
+ | Metric | v0.5.0 | v0.5.1 |
2115
+ |--------|--------|--------|
2116
+ | Time | ~250s | **55s** |
2117
+ | API calls | 30 | **6** |
2118
+ | Success rate | ~50% | ~90% |
2119
+
2120
+ ## [0.5.0] - 2026-02-23 — Smart Pipeline + Doctor + Batch Execution
2121
+
2122
+ ### Added
2123
+ - **`clawdcursor doctor`** — auto-diagnoses setup, tests models, configures optimal pipeline
2124
+ - **3-layer pipeline** — Action Router → Accessibility Reasoner → Screenshot fallback
2125
+ - **Layer 2: Accessibility Reasoner** (`src/a11y-reasoner.ts`) — text-only LLM reads the UI tree, no screenshots needed. Uses cheap models (Haiku, Qwen, GPT-4o-mini).
2126
+ - **Batch action execution** — Claude returns multiple actions per response (3.6 avg), skipping screenshots between batched actions. Drawing tasks execute 10+ actions in a single API call.
2127
+ - **Focus hints** — each screenshot includes a FOCUS directive telling Claude where to look, reducing output tokens and decision time
2128
+ - **Auto-maximize** — apps launched via Action Router are automatically maximized (`Win+Up`) for consistent layout
2129
+ - **Region capture** — `captureRegionForLLM()` crops screenshots to specific areas (2-30KB vs 58KB full)
2130
+ - **Checkpoint strategy** — screenshots only after critical state changes (app open, dialog appear), not after every action
2131
+ - **Multi-provider support** — Anthropic, OpenAI, Ollama (local/free), Kimi. Same codebase, auto-detected.
2132
+ - **Provider model map** (`src/providers.ts`) — auto-selects cheap/expensive models per provider
2133
+ - **Self-healing** — doctor falls back if a model is unavailable (e.g., Haiku → Qwen). Circuit breaker disables failing layers at runtime.
2134
+ - **Streaming LLM responses** — early JSON return saves 1-3s per call
2135
+ - **Combined accessibility script** (`scripts/get-screen-context.ps1`) — 1 PowerShell spawn instead of 3
2136
+ - **Benchmark harness** (`test-perf-comparison.ts`)
2137
+
2138
+ ### Performance
2139
+ - Screenshots: 120KB → ~80KB, 1280px target (HD for reliable icon identification)
2140
+ - JPEG quality: 70 → 65
2141
+ - Delays: 200-1500ms → 50-600ms across the board
2142
+ - System prompts: ~60% smaller (fewer tokens per call)
2143
+ - Accessibility tree: filtered to interactive elements only, 3000 char cap
2144
+ - Taskbar cache: 30s TTL (was queried every call)
2145
+ - Screen context cache: 500ms → 2s TTL
2146
+
2147
+ ### Benchmarks
2148
+
2149
+ | Task | v0.4 | v0.5 (Ollama, $0) | v0.5 (Anthropic) | v0.5 + Batch |
2150
+ |------|------|--------|---------|---------|
2151
+ | Calculator | 43s | 2.6s | 20.1s | — |
2152
+ | Notepad | 73s | 2.0s | 54.2s | — |
2153
+ | File Explorer | 53s | 1.9s | 22.1s | — |
2154
+ | Paint stickman | ~250s (30 calls) | — | ~124s (19 calls) | **101s (11 calls)** |
2155
+ | GitHub profile | — | — | ~106s (15 calls) | — |
2156
+
2157
+ ## [0.4.0] - 2026-02-22 — Native Desktop Control
2158
+
2159
+ **VNC removed.** Clawd Cursor now controls the desktop natively via @nut-tree-fork/nut-js. No VNC server required.
2160
+
2161
+ ### Breaking Changes
2162
+ - `--vnc-host`, `--vnc-port`, `--vnc-password` CLI flags removed
2163
+ - `VNC_PASSWORD`, `VNC_HOST`, `VNC_PORT` environment variables no longer used
2164
+ - `rfb2` dependency removed
2165
+ - `setup.ps1` no longer installs TightVNC
2166
+
2167
+ ### Added
2168
+ - `NativeDesktop` class (`src/native-desktop.ts`) — drop-in replacement for VNCClient
2169
+ - Direct screen capture via @nut-tree-fork/nut-js (~50ms vs ~850ms)
2170
+ - Direct mouse/keyboard control via OS-level APIs
2171
+ - Simplified onboarding: `npm install && npm start`
2172
+
2173
+ ### Performance
2174
+ - Screenshots: ~850ms → ~50ms (17× faster)
2175
+ - Connect time: ~200ms → ~38ms (5× faster)
2176
+ - Simple task (Google Docs sentence): ~120s → ~102s
2177
+ - Complex task (GitHub → Notepad → save): ~200s → ~156s
2178
+
2179
+ ### Removed
2180
+ - VNC server dependency (TightVNC)
2181
+ - `rfb2` npm package
2182
+ - VNC-related CLI flags and environment variables
2183
+ - BGRA→RGBA color swap (nut-js returns RGBA natively)
2184
+
2185
+ ## [0.3.3] - 2025-03-15
2186
+
2187
+ ### Bulletproof Headless Setup
2188
+ - setup.ps1 now completes end-to-end in a single run on fresh systems, even in non-interactive/headless AI agent shells
2189
+ - Generate random VNC password when `--vnc-password` not provided non-interactively
2190
+ - Replace `Start-Process -NoNewWindow -Wait` with `-PassThru -WindowStyle Hidden` + try/catch (msiexec crash fix)
2191
+ - Wrap `Start-Service` in its own try/catch (post-install crash fix)
2192
+ - Replace all emoji with ASCII tags for cp1252 headless terminal compatibility
2193
+
2194
+ ## [0.3.1] - 2025-03-10
2195
+
2196
+ ### SKILL.md Security Hardening
2197
+ - Added YAML frontmatter, explicit credential declarations, privacy disclosure, and security considerations for ClaWHub publishing.
2198
+
2199
+ ## [0.3.0] - 2025-03-01
2200
+
2201
+ ### Performance Optimizations (~70% faster)
2202
+ - Screenshot hash cache — skips LLM calls when the screen hasn't changed
2203
+ - Adaptive VNC frame wait — captures in ~200ms instead of fixed 800ms
2204
+ - Parallel screenshot + accessibility fetch — runs concurrently via Promise.all
2205
+ - Accessibility context cache — 500ms TTL eliminates redundant PowerShell queries
2206
+ - Async debug writes — no longer blocks the event loop
2207
+ - Exponential backoff with jitter — better retry resilience for API calls
2208
+
2209
+ ## [0.2.0] - 2025-02-21
2210
+
2211
+ ### 🚀 Major: Anthropic Computer Use API
2212
+
2213
+ Clawd Cursor now supports Anthropic's native Computer Use API (`computer_20250124`) as the **primary execution path**. This is a fundamentally different approach — the full task goes directly to Claude with native computer use tools. No decomposition, no routing. Claude sees screenshots, plans, and executes natively.
2214
+
2215
+ ### Dual Execution Paths
2216
+
2217
+ The agent now has two separate code paths selected by provider:
2218
+
2219
+ - **Path A — Computer Use API** (`--provider anthropic`): Full task sent to Claude with `computer_20250124` tool. Claude sees the screen, plans multi-step sequences, and executes them natively. Handles complex, multi-app workflows reliably.
2220
+ - **Path B — Decompose + Action Router** (`--provider openai` / offline): Original approach from v0.1.0. Parse task → subtasks → Action Router (UI Automation, zero LLM) → Vision fallback. Faster and cheaper for simple tasks, works without an API key.
2221
+
2222
+ ### Added
2223
+
2224
+ - **Anthropic Computer Use integration** — native `computer_20250124` tool type with `anthropic-beta: computer-use-2025-01-24` header
2225
+ - **Adaptive delays** — per-action timing: 1000ms for app launch, 800ms for navigation, 100ms for typing, 300ms default
2226
+ - **Verification hints** — post-action verification prompts after each Computer Use step
2227
+ - **Mouse drag** — `mouseDrag`, `mouseDown`, `mouseUp` with smooth interpolation between points
2228
+ - **Bulletproof system prompt** — planning rules, ctrl+l for URL navigation, recovery strategies for failed actions
2229
+ - **Display scaling** — automatic resolution scaling to 1280×720 for Computer Use API compatibility
2230
+ - **Vision model** — `claude-sonnet-4-20250514` for Computer Use path
2231
+
2232
+ ### Test Results
2233
+
2234
+ | Task | Time | API Calls | Result |
2235
+ |------|------|-----------|--------|
2236
+ | Google Docs: open Chrome, go to Docs, write a paragraph | 187s | 14 | ✅ All succeeded |
2237
+ | GitHub: open Chrome, navigate to profile, screenshot | 102s | — | ✅ All succeeded |
2238
+ | Notepad: open, write haiku, save to desktop | ~180s | — | ✅ File saved correctly |
2239
+ | Paint: draw a stick figure | ~90s | 16 | ✅ Drawing completed |
2240
+
2241
+ ### Breaking Changes
2242
+
2243
+ - **Provider selection now determines execution path.** `--provider anthropic` uses Computer Use API (Path A). `--provider openai` or no provider uses the original Decompose + Action Router pipeline (Path B). This is a fundamental change in behavior — the same task will execute via completely different code paths depending on the provider.
2244
+
2245
+ ### Performance Characteristics
2246
+
2247
+ | | Path A (Computer Use) | Path B (Action Router) |
2248
+ |---|---|---|
2249
+ | Best for | Complex multi-step tasks | Simple single-action tasks |
2250
+ | Reliability | Very high | Good for supported patterns |
2251
+ | Speed | ~90–190s for complex tasks | ~2s for simple tasks |
2252
+ | Cost | Higher (multiple API calls with screenshots) | Lower (1 text call or zero) |
2253
+ | Offline | No | Yes (for common patterns) |
2254
+
2255
+ ## [0.1.0] - 2025-01-15
2256
+
2257
+ ### Initial Release
2258
+
2259
+ - Action Router with Windows UI Automation — 80% of common tasks with zero LLM calls
2260
+ - Vision fallback for complex/unfamiliar UI
2261
+ - Smart task decomposition (single text-only LLM call)
2262
+ - Three-tier safety system (Auto / Preview / Confirm)
2263
+ - REST API and CLI interface
2264
+ - Windows setup script