@simular-ai/simulang-js 8.0.0 → 9.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/CHANGELOG.md CHANGED
@@ -7,6 +7,33 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
7
7
 
8
8
  ## [Unreleased]
9
9
 
10
+ ## [9.0.0] - 2026-06-29
11
+
12
+ ### Added
13
+
14
+ - `Image.fromBase64()` now accepts GIF and WebP data URLs/payloads.
15
+ - `Image.base64DataUrl()` and `Screenshot.base64DataUrl()` for returning MIME-prefixed data URLs.
16
+ - Bundled OpenRouter config now advertises `openrouter_claude_opus` for both `AskModel` and `GroundingModel`, including Claude-specific grounding coordinate scaling and image size/format limits.
17
+ - `StateSatisfiesModel` for evaluating screen state against a natural-language condition through the dedicated state-satisfaction provider.
18
+
19
+ ### Changed
20
+
21
+ - `Image.base64()` and `Screenshot.base64()` now return raw base64 without a `data:image/...;base64,` prefix, matching the Rust API. Use `base64DataUrl()` when a MIME-prefixed data URL is needed.
22
+
23
+ ## [8.1.0] - 2026-06-23
24
+
25
+ ### Added
26
+
27
+ - `AccessibilityNode.parent()`, `ancestors()`, direct/strict ancestry checks, and `lowestCommonAncestor(other)` for navigating accessibility-tree relationships. `parent()` returns `null` when a node has no parent, `ancestors()` stops only when the parent chain is fully resolved, and both throw on lookup failure; `lowestCommonAncestor` returns a `[node, level]` tuple where `level` is from the reached parentless node, returns `null` only for resolved unrelated trees, and throws if either parent chain cannot be resolved.
28
+ - `Window.screenshot(hideCursor)` — captures just this window's pixels from its own backing store (macOS `ScreenCaptureKit` window filter / Windows `Windows.Graphics.Capture`), so occluding windows don't bleed through and hardware-accelerated content (Chrome, Electron, D3D apps) is captured correctly; off-display or minimized windows throw.
29
+ - `Window.ground(model, concept)` — locate a concept within this window and return its global desktop coordinates (sugar for `screenshot(true).ground(model, concept)`); restricting the search to the window's bounds is faster and more accurate than grounding a full-screen screenshot.
30
+ - `AccessibilityNode.url` — the node's raw hyperlink target as a `string`, or `null` when the node isn't a link, has no target, or the platform doesn't expose it.
31
+ - `Image.drawDot(x, y, radius, red, green, blue)` and `Screenshot.drawDot(...)` — paint a filled opaque RGB disc at an image-pixel coordinate (companion to `drawGrid`; `radius` 0 paints a single pixel, out-of-bounds points are no-ops). Handy for annotating grounding results.
32
+
33
+ ### Changed
34
+
35
+ - Frame-relative input on a `Window` (`moveMouse`, `click`) now throws when the target point maps off every connected display, instead of letting the OS silently clamp the cursor to a screen edge and click the wrong location.
36
+
10
37
  ## [8.0.0] - 2026-06-05
11
38
 
12
39
  ### Added
package/CLAUDE.md CHANGED
@@ -2,7 +2,7 @@
2
2
 
3
3
  Node.js bindings for the Rust `simulang-rs` crate (via napi-rs). Cross-platform
4
4
  desktop automation: apps, windows, accessibility trees, mouse/keyboard,
5
- screenshots, clipboard, audio, and VLM/STT model access.
5
+ screenshots, clipboard, audio, and VLM/LLM/STT model access.
6
6
 
7
7
  This file ships in the npm tarball alongside `index.d.ts` and is versioned
8
8
  with it.
@@ -10,7 +10,7 @@ with it.
10
10
  ## Where the API is documented
11
11
 
12
12
  Read **`index.d.ts`** first — it is the source of truth. Every class,
13
- function, and enum is fully typed (~1500 lines) and carries JSDoc covering
13
+ function, and enum is fully typed (~2100 lines) and carries JSDoc covering
14
14
  idioms, lifecycle rules, platform quirks, and inter-API trade-offs that types
15
15
  alone can't express. The JSDoc is generated from doc comments,
16
16
  so the per-symbol guidance there is authoritative — trust it over any
@@ -24,12 +24,22 @@ restatement elsewhere.
24
24
  - Many objects are **handles to platform resources** (windows, audio devices,
25
25
  accessibility trees, file/directory handles). Their lifetime matters;
26
26
  dropping them can free the underlying resource.
27
- - Coordinates are uniform across platforms: **top-left origin, global
28
- physical pixels** (also called device pixels — the raw hardware pixels of
29
- the display, not the logical / CSS / point units used in browsers and
30
- some desktop UI frameworks). On a HiDPI display, a 1920×1080-logical
31
- screen is 3840×2160 in these coordinates. Image and screenshot dimensions
32
- are likewise in physical pixels.
27
+ - Coordinates live on the **global desktop**: top-left origin at `(0, 0)` on
28
+ the primary monitor, in **OS-native units** — the unit is **not** the same
29
+ on every platform:
30
+ - **Windows / Linux**: **physical pixels** (raw hardware pixels).
31
+ - **macOS**: **logical points** — on a 2× Retina display one point spans two
32
+ hardware pixels, so a 1920×1080-logical screen is `1920×1080` here, not
33
+ `3840×2160`.
34
+
35
+ These are the native units the OS input/accessibility APIs expect, **not**
36
+ the browser logical/CSS pixel. Within a single OS every API speaks that OS's
37
+ unit, so coordinates round-trip between `MouseController`, the various
38
+ `boundingBox()` methods, `Screenshot.toGlobalDesktopCoordinates()`, and
39
+ grounding output **without conversion**; only code crossing into a different
40
+ coordinate system (e.g. an Electron overlay measured in CSS pixels) must
41
+ account for the per-OS unit. Monitors arranged to the left of / above the
42
+ primary contribute **negative** coordinates, so don't assume `x, y >= 0`.
33
43
 
34
44
  ## Logging is on by default
35
45
 
package/index.d.ts CHANGED
@@ -61,6 +61,13 @@ export declare class AccessibilityNode {
61
61
  get automationId(): string
62
62
  /** Whether the node accepts user input. */
63
63
  get isEnabled(): boolean
64
+ /**
65
+ * Hyperlink target of this node, or `null` when the node is not a link,
66
+ * the link has no target, or the platform backend does not expose the
67
+ * target through the accessibility API. The string is the raw URL as
68
+ * reported by the platform (no parsing or normalisation).
69
+ */
70
+ get url(): string | null
64
71
  /**
65
72
  * Live bounding box of the element on the global desktop, in the canonical
66
73
  * coordinate space (OS-native units; see
@@ -74,6 +81,31 @@ export declare class AccessibilityNode {
74
81
  * simulang-rs's convention of treating walk failures as "no children").
75
82
  */
76
83
  children(): Array<AccessibilityNode>
84
+ /**
85
+ * Direct parent node, or `null` if this node has no parent. Parent lookup
86
+ * failures throw.
87
+ */
88
+ parent(): AccessibilityNode | null
89
+ /**
90
+ * Parent chain for this node, nearest parent first, excluding this node.
91
+ * Stops when a node has no parent; parent lookup failures throw.
92
+ */
93
+ ancestors(): Array<AccessibilityNode>
94
+ /** Whether this node is the direct parent of `other`. */
95
+ isParentOf(other: AccessibilityNode): boolean
96
+ /** Whether this node is a direct child of `other`. */
97
+ isChildOf(other: AccessibilityNode): boolean
98
+ /** Whether this node is a strict ancestor of `other`. */
99
+ isAncestorOf(other: AccessibilityNode): boolean
100
+ /** Whether this node is a strict descendant of `other`. */
101
+ isDescendantOf(other: AccessibilityNode): boolean
102
+ /**
103
+ * Lowest shared ancestor for this node and `other`, plus that ancestor's
104
+ * level from the reached parentless node (`parentless = 0`, its children
105
+ * `= 1`). Returns `null` when both chains resolve and no shared ancestor
106
+ * exists; lookup failures throw.
107
+ */
108
+ lowestCommonAncestor(other: AccessibilityNode): [AccessibilityNode, number] | null
77
109
  /**
78
110
  * Render this node's accessibility subtree as an indented
79
111
  * Playwright-style aria snapshot string.
@@ -315,13 +347,15 @@ export declare class AskModel {
315
347
  /**
316
348
  * First LLM model advertised by the first LLM-capable provider in the
317
349
  * loaded configuration whose credentials are currently available.
318
- * Throws if no provider in the loaded config advertises an LLM service.
350
+ * Providers with missing credentials are skipped; throws if no provider
351
+ * remains available.
319
352
  */
320
353
  static default(): AskModel
321
354
  /**
322
355
  * Resolve a model alias against the loaded config (e.g.
323
- * `"openrouter_gpt_4o_mini"` from the bundled `openrouter` provider, or
324
- * any alias declared by a user provider). Throws if the alias is unknown.
356
+ * `"openrouter_gpt_4o_mini"` / `"openrouter_claude_opus"` from the
357
+ * bundled `openrouter` provider, or any alias declared by a user provider).
358
+ * Throws if the alias is unknown.
325
359
  */
326
360
  static byAlias(alias: string): AskModel
327
361
  /**
@@ -352,8 +386,8 @@ export declare class AskModel {
352
386
  * - `text`: optional accessibility-tree (or any) text included as
353
387
  * structural context. Pass `null` / omit to skip.
354
388
  * - `images`: zero or more images to attach. Each is encoded as a
355
- * base64 data URL and sent as an `image_url` chat content part. Pass
356
- * `null` / omit for none.
389
+ * base64 data URL in a model-supported format and sent as an
390
+ * `image_url` chat content part. Pass `null` / omit for none.
357
391
  *
358
392
  * Returns the trimmed assistant response on success.
359
393
  */
@@ -532,14 +566,14 @@ export declare class File {
532
566
  export declare class GroundingModel {
533
567
  /**
534
568
  * First VLM model advertised by the first VLM provider in the loaded
535
- * config. Throws if no provider in the loaded config advertises a VLM
536
- * service.
569
+ * configuration whose credentials are currently available. Providers with
570
+ * missing credentials are skipped; throws if no provider remains available.
537
571
  */
538
572
  static default(): GroundingModel
539
573
  /**
540
574
  * Resolve a model alias against the loaded config (e.g. `"ui_venus_30b"`,
541
- * `"ui_tars_7b"`, or any alias declared by a user provider). Throws if
542
- * the alias is unknown.
575
+ * `"ui_tars_7b"`, `"openrouter_claude_opus"`, or any alias declared by a
576
+ * user provider). Throws if the alias is unknown.
543
577
  */
544
578
  static byAlias(alias: string): GroundingModel
545
579
  /**
@@ -595,6 +629,21 @@ export declare class Image {
595
629
  * Grid squares have the specified `width` and `height`.
596
630
  */
597
631
  drawGrid(width: number, height: number): void
632
+ /**
633
+ * Paints a filled disc on the image. Useful for visualizing point
634
+ * coordinates returned from grounding, layout queries, etc.
635
+ *
636
+ * `x` / `y` are image-pixel coordinates of the disc's centre. `radius`
637
+ * is the disc radius in pixels (`0` paints a single pixel at the
638
+ * centre). `(red, green, blue)` is the fill color; alpha is always 255
639
+ * (opaque replacement of the underlying pixel).
640
+ *
641
+ * Coordinates that fall outside the image bounds (negative, or past the
642
+ * width / height) silently produce no pixel, so the helper is safe to
643
+ * call with the raw output of a grounding model even at the edge of the
644
+ * captured rect.
645
+ */
646
+ drawDot(x: number, y: number, radius: number, red: number, green: number, blue: number): void
598
647
  /**
599
648
  * Compress the image by converting it to JPEG with the specified quality.
600
649
  *
@@ -616,19 +665,21 @@ export declare class Image {
616
665
  shrink(nwidth: number, nheight: number): void
617
666
  /** Returns the image dimensions as `[width, height]` in pixels. */
618
667
  get dimensions(): [number, number]
668
+ /** Returns the image encoded as raw base64, without a MIME prefix. */
669
+ base64(): string
619
670
  /**
620
671
  * Returns the image encoded as a base64 data URL.
621
672
  *
622
673
  * The result includes the MIME prefix, for example
623
- * `data:image/png;base64,...` or `data:image/jpeg;base64,...`.
674
+ * `data:image/png;base64,...`, `data:image/jpeg;base64,...`,
675
+ * `data:image/gif;base64,...`, or `data:image/webp;base64,...`.
624
676
  */
625
- base64(): string
677
+ base64DataUrl(): string
626
678
  /**
627
679
  * Decodes a base64 image string into an image.
628
680
  *
629
- * Accepts either a raw base64 payload or a data URL such as
630
- * `data:image/png;base64,...`, `data:image/jpeg;base64,...`, or
631
- * `data:image/jpg;base64,...`.
681
+ * Accepts either a raw base64 payload or a PNG, JPEG/JPG, GIF, or WebP
682
+ * data URL.
632
683
  */
633
684
  static fromBase64(base64: string): Image
634
685
  /**
@@ -1072,6 +1123,21 @@ export declare class Screenshot {
1072
1123
  * Grid squares have the specified `width` and `height`.
1073
1124
  */
1074
1125
  drawGrid(width: number, height: number): void
1126
+ /**
1127
+ * Paints a filled disc on the image. Useful for visualizing point
1128
+ * coordinates returned from grounding, layout queries, etc.
1129
+ *
1130
+ * `x` / `y` are image-pixel coordinates of the disc's centre. `radius`
1131
+ * is the disc radius in pixels (`0` paints a single pixel at the
1132
+ * centre). `(red, green, blue)` is the fill color; alpha is always 255
1133
+ * (opaque replacement of the underlying pixel).
1134
+ *
1135
+ * Coordinates that fall outside the image bounds (negative, or past the
1136
+ * width / height) silently produce no pixel, so the helper is safe to
1137
+ * call with the raw output of a grounding model even at the edge of the
1138
+ * captured rect.
1139
+ */
1140
+ drawDot(x: number, y: number, radius: number, red: number, green: number, blue: number): void
1075
1141
  /**
1076
1142
  * Compress the image by converting it to JPEG with the specified quality.
1077
1143
  *
@@ -1093,13 +1159,16 @@ export declare class Screenshot {
1093
1159
  shrink(nwidth: number, nheight: number): void
1094
1160
  /** Returns the screenshot dimensions as `[width, height]` in pixels. */
1095
1161
  get dimensions(): [number, number]
1162
+ /** Returns the image encoded as raw base64, without a MIME prefix. */
1163
+ base64(): string
1096
1164
  /**
1097
1165
  * Returns the image encoded as a base64 data URL.
1098
1166
  *
1099
1167
  * The result includes the MIME prefix, for example
1100
- * `data:image/png;base64,...` or `data:image/jpeg;base64,...`.
1168
+ * `data:image/png;base64,...`, `data:image/jpeg;base64,...`,
1169
+ * `data:image/gif;base64,...`, or `data:image/webp;base64,...`.
1101
1170
  */
1102
- base64(): string
1171
+ base64DataUrl(): string
1103
1172
  /**
1104
1173
  * Converts a point in this screenshot to global desktop coordinates.
1105
1174
  *
@@ -1148,6 +1217,44 @@ export declare class ScreenshotCoordinateType {
1148
1217
  static normalized(range: number): ScreenshotCoordinateType
1149
1218
  }
1150
1219
 
1220
+ /**
1221
+ * A dedicated state-satisfaction model that evaluates whether observed screen
1222
+ * state satisfies a natural-language condition.
1223
+ *
1224
+ * This speaks Simular's `v1/perception/state_satisfies` endpoint. It is
1225
+ * separate from `AskModel`, which remains an OpenAI-compatible
1226
+ * chat-completions client.
1227
+ */
1228
+ export declare class StateSatisfiesModel {
1229
+ /**
1230
+ * First state-satisfaction model advertised by the first capable provider
1231
+ * in the loaded configuration whose credentials are currently available.
1232
+ * Throws if no provider advertises the state-satisfies service.
1233
+ */
1234
+ static default(): StateSatisfiesModel
1235
+ /**
1236
+ * Resolve a model alias against the loaded config. Throws if the alias is
1237
+ * unknown.
1238
+ */
1239
+ static byAlias(alias: string): StateSatisfiesModel
1240
+ /** Convenience shim for the bundled `simular_state_satisfies` alias. */
1241
+ static simularStateSatisfies(): StateSatisfiesModel
1242
+ /**
1243
+ * Every state-satisfaction model alias accepted by `byAlias` on this
1244
+ * machine, deduplicated and sorted alphabetically.
1245
+ */
1246
+ static availableAliases(): Array<string>
1247
+ /** The provider-configured model identifier. */
1248
+ get name(): string
1249
+ /** Check that the provider credentials work. Throws if they do not. */
1250
+ checkAuth(): void
1251
+ /**
1252
+ * Evaluate `condition` against the supplied accessibility text and raw
1253
+ * screenshot base64.
1254
+ */
1255
+ stateSatisfies(condition: string, text: string, image: string): boolean
1256
+ }
1257
+
1151
1258
  /** A speech-to-text model that can transcribe audio. */
1152
1259
  export declare class SttModel {
1153
1260
  /**
@@ -1292,6 +1399,16 @@ export declare class Window {
1292
1399
  * top-left of [`Window.boundingBox`], which includes the title bar and
1293
1400
  * other OS chrome.
1294
1401
  *
1402
+ * Two validation passes run before any input is synthesized, and either
1403
+ * failing throws:
1404
+ * 1. `(x, y)` must lie inside the window frame (`[0, width) × [0,
1405
+ * height)`).
1406
+ * 2. The resulting global point must fall on at least one connected
1407
+ * display. Windows dragged partly off-screen can map an in-frame
1408
+ * point to a coordinate the OS would silently clamp to a display
1409
+ * edge — erroring out is strictly more useful than moving the cursor
1410
+ * somewhere unintended and clicking the wrong thing.
1411
+ *
1295
1412
  * Does **not** focus the window. Call [`Window.focus`] (or
1296
1413
  * [`Instance.focus`]) first if the click target requires the window
1297
1414
  * to be active.
@@ -1310,7 +1427,9 @@ export declare class Window {
1310
1427
  * Move the cursor to `(x, y)` inside this window and synthesise a
1311
1428
  * mouse button event. Sugar for `moveMouse(x, y); button(...)`.
1312
1429
  *
1313
- * Coordinates are window-frame-relative (see [`Window.moveMouse`]).
1430
+ * Coordinates are window-frame-relative (see [`Window.moveMouse`]), and
1431
+ * are subject to the same bounds / on-screen validation: an off-frame or
1432
+ * off-screen target throws instead of clicking the wrong location.
1314
1433
  * Does not focus the window.
1315
1434
  */
1316
1435
  click(x: number, y: number, button: Button, direction: Direction): void
@@ -1333,6 +1452,43 @@ export declare class Window {
1333
1452
  * [`AccessibilityTree.snapshot`] instead.
1334
1453
  */
1335
1454
  snapshot(): string
1455
+ /**
1456
+ * Capture just this window's pixels.
1457
+ *
1458
+ * Reads from the window's own backing store, so overlapping windows do
1459
+ * **not** bleed through and the whole window is captured even where it is
1460
+ * occluded or extends past a display edge — partially off-screen and
1461
+ * multi-display-spanning windows are captured in full. The window must be
1462
+ * on-screen at capture time: a minimized or fully off-display window
1463
+ * throws, because there is nothing to capture.
1464
+ */
1465
+ screenshot(hideCursor: boolean): Screenshot
1466
+ /**
1467
+ * Locate `concept` inside this window's pixels and return its **global
1468
+ * desktop coordinates** `[x, y]` in OS-native units (may be negative on
1469
+ * multi-monitor setups; see [`MouseController`]), ready to feed straight
1470
+ * into [`MouseController.moveMouse`] / click helpers.
1471
+ *
1472
+ * Sugar for `screenshot(true).ground(model, concept)` — the cursor is
1473
+ * hidden because it can occlude or distract the model. Restricts the
1474
+ * model's search to this window's bounds, which is both faster (fewer
1475
+ * pixels to upload) and more accurate (no risk of grounding onto a
1476
+ * concept that happens to be elsewhere on the screen) than grounding a
1477
+ * full-screen screenshot.
1478
+ *
1479
+ * Drop down to [`Window.screenshot`] + [`Screenshot.ground`] directly
1480
+ * when you want to keep the cursor visible, reuse the screenshot across
1481
+ * multiple `ground` calls, or shrink / compress the image before
1482
+ * sending it.
1483
+ *
1484
+ * The screenshot includes any portion of the window that's past the
1485
+ * desktop edge (the capture reads the window's own backing store, not a
1486
+ * display rectangle). A `concept` the model locates in that off-screen
1487
+ * region returns coordinates the OS won't route a click to;
1488
+ * [`Window.moveMouse`] catches this and errors out instead of clicking
1489
+ * the nearest on-screen point.
1490
+ */
1491
+ ground(model: GroundingModel, concept: string): [number, number]
1336
1492
  /**
1337
1493
  * Search this window's accessibility subtree by *concept text*, using
1338
1494
  * bag-of-words paired-Jaccard scoring against each node's