@simular-ai/simulang-js 8.1.0 → 10.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/CHANGELOG.md CHANGED
@@ -7,6 +7,31 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
7
7
 
8
8
  ## [Unreleased]
9
9
 
10
+ ## [10.0.0] - 2026-07-03
11
+
12
+ ### Added
13
+
14
+ - `Window.focused()` — the window that currently has keyboard focus, or `null` when nothing is focused (no windows open, focus on the desktop, or the frontmost app has no focused window). The read-side counterpart to the instance method `Window.focus()`. Backed by `GetForegroundWindow` (Windows), the frontmost app's `AXFocusedWindow` (macOS), and `_NET_ACTIVE_WINDOW` (Linux); throws only on genuine backend failures.
15
+ - `ImageGenModel` for generating images from a text prompt through a provider's `image_gen` service. `generate(prompt, quality?, aspectRatio?, images?)` returns a `[image, description]` tuple — the generated `Image` and the model's text description. `quality` (`ImageQuality.Fast` / `.Pro`) and `aspectRatio` (`ImageAspectRatio.Square`, `.Landscape16x9`, …) are type-safe enums defaulting to `Fast` / `Square`. Optional reference images are downscaled / transcoded and size-checked against the model's configured limits before sending. Discover aliases with `ImageGenModel.availableAliases()`; the bundled `simular_cloud` provider advertises `simular_image_gen`.
16
+ - `Image.drawBox(bounds, thickness, red, green, blue)` and `Screenshot.drawBox(...)` — paint the outline of an axis-aligned `BoundingBox` (e.g. the result of `Window.boundingBox()` or `AccessibilityNode.boundingBox()`) with an inset border `thickness` pixels wide in an opaque RGB color. On `Image` the box is in image pixels; on `Screenshot` it is in **global desktop** coordinates (converted to image pixels internally), so a grounding or element box can be passed straight in. Out-of-bounds pixels are clipped; throws on zero thickness or a degenerate box (`right <= left` / `bottom <= top`). Companion to `drawDot` for annotating bounding boxes.
17
+
18
+ ### Breaking
19
+
20
+ - `Screenshot.drawDot(x, y, ...)` now interprets `(x, y)` as **global desktop** coordinates (the same space `Screenshot.ground` returns and `MouseController` consumes) rather than screenshot image pixels; the point is converted to image pixels internally, inverting the capture offset and any resampling. This lets a `Screenshot.ground` result or an element's `boundingBox()` corner be drawn without manual conversion. Update call sites that passed raw image-pixel coordinates to `Screenshot.drawDot` to pass global desktop coordinates instead. `Image.drawDot` is unchanged (still image pixels).
21
+
22
+ ## [9.0.0] - 2026-06-29
23
+
24
+ ### Added
25
+
26
+ - `Image.fromBase64()` now accepts GIF and WebP data URLs/payloads.
27
+ - `Image.base64DataUrl()` and `Screenshot.base64DataUrl()` for returning MIME-prefixed data URLs.
28
+ - Bundled OpenRouter config now advertises `openrouter_claude_opus` for both `AskModel` and `GroundingModel`, including Claude-specific grounding coordinate scaling and image size/format limits.
29
+ - `StateSatisfiesModel` for evaluating screen state against a natural-language condition through the dedicated state-satisfaction provider.
30
+
31
+ ### Changed
32
+
33
+ - `Image.base64()` and `Screenshot.base64()` now return raw base64 without a `data:image/...;base64,` prefix, matching the Rust API. Use `base64DataUrl()` when a MIME-prefixed data URL is needed.
34
+
10
35
  ## [8.1.0] - 2026-06-23
11
36
 
12
37
  ### Added
package/CLAUDE.md CHANGED
@@ -2,7 +2,7 @@
2
2
 
3
3
  Node.js bindings for the Rust `simulang-rs` crate (via napi-rs). Cross-platform
4
4
  desktop automation: apps, windows, accessibility trees, mouse/keyboard,
5
- screenshots, clipboard, audio, and VLM/STT model access.
5
+ screenshots, clipboard, audio, and VLM/LLM/STT model access.
6
6
 
7
7
  This file ships in the npm tarball alongside `index.d.ts` and is versioned
8
8
  with it.
@@ -10,7 +10,7 @@ with it.
10
10
  ## Where the API is documented
11
11
 
12
12
  Read **`index.d.ts`** first — it is the source of truth. Every class,
13
- function, and enum is fully typed (~1500 lines) and carries JSDoc covering
13
+ function, and enum is fully typed (~2100 lines) and carries JSDoc covering
14
14
  idioms, lifecycle rules, platform quirks, and inter-API trade-offs that types
15
15
  alone can't express. The JSDoc is generated from doc comments,
16
16
  so the per-symbol guidance there is authoritative — trust it over any
@@ -24,12 +24,22 @@ restatement elsewhere.
24
24
  - Many objects are **handles to platform resources** (windows, audio devices,
25
25
  accessibility trees, file/directory handles). Their lifetime matters;
26
26
  dropping them can free the underlying resource.
27
- - Coordinates are uniform across platforms: **top-left origin, global
28
- physical pixels** (also called device pixels — the raw hardware pixels of
29
- the display, not the logical / CSS / point units used in browsers and
30
- some desktop UI frameworks). On a HiDPI display, a 1920×1080-logical
31
- screen is 3840×2160 in these coordinates. Image and screenshot dimensions
32
- are likewise in physical pixels.
27
+ - Coordinates live on the **global desktop**: top-left origin at `(0, 0)` on
28
+ the primary monitor, in **OS-native units** — the unit is **not** the same
29
+ on every platform:
30
+ - **Windows / Linux**: **physical pixels** (raw hardware pixels).
31
+ - **macOS**: **logical points** — on a 2× Retina display one point spans two
32
+ hardware pixels, so a 1920×1080-logical screen is `1920×1080` here, not
33
+ `3840×2160`.
34
+
35
+ These are the native units the OS input/accessibility APIs expect, **not**
36
+ the browser logical/CSS pixel. Within a single OS every API speaks that OS's
37
+ unit, so coordinates round-trip between `MouseController`, the various
38
+ `boundingBox()` methods, `Screenshot.toGlobalDesktopCoordinates()`, and
39
+ grounding output **without conversion**; only code crossing into a different
40
+ coordinate system (e.g. an Electron overlay measured in CSS pixels) must
41
+ account for the per-OS unit. Monitors arranged to the left of / above the
42
+ primary contribute **negative** coordinates, so don't assume `x, y >= 0`.
33
43
 
34
44
  ## Logging is on by default
35
45
 
package/index.d.ts CHANGED
@@ -347,13 +347,15 @@ export declare class AskModel {
347
347
  /**
348
348
  * First LLM model advertised by the first LLM-capable provider in the
349
349
  * loaded configuration whose credentials are currently available.
350
- * Throws if no provider in the loaded config advertises an LLM service.
350
+ * Providers with missing credentials are skipped; throws if no provider
351
+ * remains available.
351
352
  */
352
353
  static default(): AskModel
353
354
  /**
354
355
  * Resolve a model alias against the loaded config (e.g.
355
- * `"openrouter_gpt_4o_mini"` from the bundled `openrouter` provider, or
356
- * any alias declared by a user provider). Throws if the alias is unknown.
356
+ * `"openrouter_gpt_4o_mini"` / `"openrouter_claude_opus"` from the
357
+ * bundled `openrouter` provider, or any alias declared by a user provider).
358
+ * Throws if the alias is unknown.
357
359
  */
358
360
  static byAlias(alias: string): AskModel
359
361
  /**
@@ -384,8 +386,8 @@ export declare class AskModel {
384
386
  * - `text`: optional accessibility-tree (or any) text included as
385
387
  * structural context. Pass `null` / omit to skip.
386
388
  * - `images`: zero or more images to attach. Each is encoded as a
387
- * base64 data URL and sent as an `image_url` chat content part. Pass
388
- * `null` / omit for none.
389
+ * base64 data URL in a model-supported format and sent as an
390
+ * `image_url` chat content part. Pass `null` / omit for none.
389
391
  *
390
392
  * Returns the trimmed assistant response on success.
391
393
  */
@@ -564,14 +566,14 @@ export declare class File {
564
566
  export declare class GroundingModel {
565
567
  /**
566
568
  * First VLM model advertised by the first VLM provider in the loaded
567
- * config. Throws if no provider in the loaded config advertises a VLM
568
- * service.
569
+ * configuration whose credentials are currently available. Providers with
570
+ * missing credentials are skipped; throws if no provider remains available.
569
571
  */
570
572
  static default(): GroundingModel
571
573
  /**
572
574
  * Resolve a model alias against the loaded config (e.g. `"ui_venus_30b"`,
573
- * `"ui_tars_7b"`, or any alias declared by a user provider). Throws if
574
- * the alias is unknown.
575
+ * `"ui_tars_7b"`, `"openrouter_claude_opus"`, or any alias declared by a
576
+ * user provider). Throws if the alias is unknown.
575
577
  */
576
578
  static byAlias(alias: string): GroundingModel
577
579
  /**
@@ -642,6 +644,21 @@ export declare class Image {
642
644
  * captured rect.
643
645
  */
644
646
  drawDot(x: number, y: number, radius: number, red: number, green: number, blue: number): void
647
+ /**
648
+ * Draws the outline of the axis-aligned rectangle `bounds` on the image.
649
+ * Useful for visualizing bounding boxes returned from grounding, element /
650
+ * layout queries, ground-truth annotations, etc.
651
+ *
652
+ * `bounds` is in image-pixel coordinates and covers `[left, right) ×
653
+ * [top, bottom)` (`right` / `bottom` exclusive, matching a `BoundingBox`).
654
+ * The border is `thickness` pixels wide, drawn inset within that range so
655
+ * nothing is painted outside it, in the opaque RGB `(red, green, blue)`
656
+ * color. Pixels that fall outside the image bounds are silently clipped, so
657
+ * it is safe to call with a box that extends past the image. Throws when
658
+ * `thickness` is `0` or `bounds` is degenerate (`right <= left` or
659
+ * `bottom <= top`).
660
+ */
661
+ drawBox(bounds: BoundingBox, thickness: number, red: number, green: number, blue: number): void
645
662
  /**
646
663
  * Compress the image by converting it to JPEG with the specified quality.
647
664
  *
@@ -663,19 +680,21 @@ export declare class Image {
663
680
  shrink(nwidth: number, nheight: number): void
664
681
  /** Returns the image dimensions as `[width, height]` in pixels. */
665
682
  get dimensions(): [number, number]
683
+ /** Returns the image encoded as raw base64, without a MIME prefix. */
684
+ base64(): string
666
685
  /**
667
686
  * Returns the image encoded as a base64 data URL.
668
687
  *
669
688
  * The result includes the MIME prefix, for example
670
- * `data:image/png;base64,...` or `data:image/jpeg;base64,...`.
689
+ * `data:image/png;base64,...`, `data:image/jpeg;base64,...`,
690
+ * `data:image/gif;base64,...`, or `data:image/webp;base64,...`.
671
691
  */
672
- base64(): string
692
+ base64DataUrl(): string
673
693
  /**
674
694
  * Decodes a base64 image string into an image.
675
695
  *
676
- * Accepts either a raw base64 payload or a data URL such as
677
- * `data:image/png;base64,...`, `data:image/jpeg;base64,...`, or
678
- * `data:image/jpg;base64,...`.
696
+ * Accepts either a raw base64 payload or a PNG, JPEG/JPG, GIF, or WebP
697
+ * data URL.
679
698
  */
680
699
  static fromBase64(base64: string): Image
681
700
  /**
@@ -687,6 +706,79 @@ export declare class Image {
687
706
  ground(model: GroundingModel, concept: string): [number, number]
688
707
  }
689
708
 
709
+ /**
710
+ * A model for generating images from a text prompt (with optional reference
711
+ * images), speaking Simular's `v1/image-gen` wire contract.
712
+ *
713
+ * This targets a dedicated Simular endpoint rather than an `OpenAI`-compatible
714
+ * one; the endpoint picks the concrete backend model from the [`ImageQuality`]
715
+ * argument, so the resolved model `name` is nominal (used only for `name`).
716
+ * Reference images are downscaled / transcoded and size-checked against the
717
+ * model's configured image limits before sending, exactly as the VLM/LLM image
718
+ * paths do.
719
+ */
720
+ export declare class ImageGenModel {
721
+ /**
722
+ * First image-gen model advertised by the first capable provider in the
723
+ * loaded configuration whose credentials are currently available.
724
+ * Providers that advertise the service but lack working credentials are
725
+ * skipped. Throws only if no provider is left after that filter.
726
+ */
727
+ static default(): ImageGenModel
728
+ /**
729
+ * Resolve a model alias against the loaded configuration.
730
+ *
731
+ * The alias is looked up across every loaded provider in alphabetical
732
+ * order; the first provider that advertises this alias wins. Throws if the
733
+ * alias is unknown.
734
+ */
735
+ static byAlias(alias: string): ImageGenModel
736
+ /** Convenience shim for the bundled Simular image-gen alias. */
737
+ static simularImageGen(): ImageGenModel
738
+ /**
739
+ * Every image-gen model alias advertised by the loaded configuration,
740
+ * deduplicated and sorted alphabetically. Use to discover what `byAlias`
741
+ * will accept on the current machine.
742
+ */
743
+ static availableAliases(): Array<string>
744
+ /**
745
+ * The model identifier from provider configuration. Nominal — the
746
+ * endpoint selects the concrete model from the [`ImageQuality`] argument.
747
+ */
748
+ get name(): string
749
+ /**
750
+ * Check that provider credentials work. Throws (and logs a warning) if they
751
+ * do not.
752
+ *
753
+ * Call right after creating the model so a bad key fails fast:
754
+ *
755
+ * ```ts
756
+ * try { model.checkAuth() } catch { process.exit(1) }
757
+ * ```
758
+ */
759
+ checkAuth(): void
760
+ /**
761
+ * Generate an image from `prompt`.
762
+ *
763
+ * - `prompt`: a description of the image to generate.
764
+ * - `quality`: selects the model tier; defaults to `ImageQuality.Fast`.
765
+ * - `aspectRatio`: the output shape; defaults to `ImageAspectRatio.Square`.
766
+ * - `images`: reference images (count capped by the model's configured
767
+ * `max_images_per_request` image limit) that condition the result; each is
768
+ * downscaled / transcoded to a supported format and size-checked before
769
+ * sending. Pass `null` / omit for none.
770
+ *
771
+ * Returns a `[image, description]` tuple: the first generated [`Image`] and
772
+ * the model's text description (an empty string when the model returns none).
773
+ */
774
+ generate(
775
+ prompt: string,
776
+ quality?: ImageQuality | undefined | null,
777
+ aspectRatio?: ImageAspectRatio | undefined | null,
778
+ images?: Array<Image> | undefined | null,
779
+ ): [Image, string]
780
+ }
781
+
690
782
  /** Represents an opened application instance. */
691
783
  export declare class Instance {
692
784
  /** Get the process ID of the opened application (returns 0 when unknown). */
@@ -1120,20 +1212,39 @@ export declare class Screenshot {
1120
1212
  */
1121
1213
  drawGrid(width: number, height: number): void
1122
1214
  /**
1123
- * Paints a filled disc on the image. Useful for visualizing point
1124
- * coordinates returned from grounding, layout queries, etc.
1215
+ * Paints a filled disc on the screenshot. Useful for visualizing point
1216
+ * coordinates returned from grounding, element / layout queries, etc.
1125
1217
  *
1126
- * `x` / `y` are image-pixel coordinates of the disc's centre. `radius`
1127
- * is the disc radius in pixels (`0` paints a single pixel at the
1128
- * centre). `(red, green, blue)` is the fill color; alpha is always 255
1129
- * (opaque replacement of the underlying pixel).
1218
+ * `x` / `y` are **global desktop** coordinates (the same space
1219
+ * `Screenshot.ground` returns and `MouseController` consumes), which are
1220
+ * converted to image pixels inverting the capture offset and any
1221
+ * resampling before drawing. So a `ground(...)` result or an element's
1222
+ * `boundingBox()` corner can be passed straight in. `radius` is the disc
1223
+ * radius in pixels (`0` paints a single pixel at the centre).
1224
+ * `(red, green, blue)` is the fill color; alpha is always 255 (opaque
1225
+ * replacement of the underlying pixel).
1130
1226
  *
1131
- * Coordinates that fall outside the image bounds (negative, or past the
1132
- * width / height) silently produce no pixel, so the helper is safe to
1133
- * call with the raw output of a grounding model even at the edge of the
1134
- * captured rect.
1227
+ * Coordinates that map outside the image bounds silently produce no pixel,
1228
+ * so the helper is safe to call even for points that fall outside the
1229
+ * captured region.
1135
1230
  */
1136
1231
  drawDot(x: number, y: number, radius: number, red: number, green: number, blue: number): void
1232
+ /**
1233
+ * Draws the outline of the axis-aligned rectangle `bounds` on the
1234
+ * screenshot. Useful for visualizing bounding boxes returned from grounding,
1235
+ * element / window `boundingBox()` queries, ground-truth annotations, etc.
1236
+ *
1237
+ * `bounds` is in **global desktop** coordinates (the same space element /
1238
+ * window `boundingBox()` and grounding results use); its corners are
1239
+ * converted to image pixels — inverting the capture offset and any
1240
+ * resampling — before drawing, so an element's `boundingBox()` can be passed
1241
+ * straight in. It covers `[left, right) × [top, bottom)` (`right` / `bottom`
1242
+ * exclusive). The border is `thickness` pixels wide, drawn inset, in the
1243
+ * opaque RGB `(red, green, blue)` color. Pixels that map outside the image
1244
+ * bounds are silently clipped. Throws when `thickness` is `0` or `bounds` is
1245
+ * degenerate (`right <= left` or `bottom <= top`).
1246
+ */
1247
+ drawBox(bounds: BoundingBox, thickness: number, red: number, green: number, blue: number): void
1137
1248
  /**
1138
1249
  * Compress the image by converting it to JPEG with the specified quality.
1139
1250
  *
@@ -1155,13 +1266,16 @@ export declare class Screenshot {
1155
1266
  shrink(nwidth: number, nheight: number): void
1156
1267
  /** Returns the screenshot dimensions as `[width, height]` in pixels. */
1157
1268
  get dimensions(): [number, number]
1269
+ /** Returns the image encoded as raw base64, without a MIME prefix. */
1270
+ base64(): string
1158
1271
  /**
1159
1272
  * Returns the image encoded as a base64 data URL.
1160
1273
  *
1161
1274
  * The result includes the MIME prefix, for example
1162
- * `data:image/png;base64,...` or `data:image/jpeg;base64,...`.
1275
+ * `data:image/png;base64,...`, `data:image/jpeg;base64,...`,
1276
+ * `data:image/gif;base64,...`, or `data:image/webp;base64,...`.
1163
1277
  */
1164
- base64(): string
1278
+ base64DataUrl(): string
1165
1279
  /**
1166
1280
  * Converts a point in this screenshot to global desktop coordinates.
1167
1281
  *
@@ -1210,6 +1324,44 @@ export declare class ScreenshotCoordinateType {
1210
1324
  static normalized(range: number): ScreenshotCoordinateType
1211
1325
  }
1212
1326
 
1327
+ /**
1328
+ * A dedicated state-satisfaction model that evaluates whether observed screen
1329
+ * state satisfies a natural-language condition.
1330
+ *
1331
+ * This speaks Simular's `v1/perception/state_satisfies` endpoint. It is
1332
+ * separate from `AskModel`, which remains an OpenAI-compatible
1333
+ * chat-completions client.
1334
+ */
1335
+ export declare class StateSatisfiesModel {
1336
+ /**
1337
+ * First state-satisfaction model advertised by the first capable provider
1338
+ * in the loaded configuration whose credentials are currently available.
1339
+ * Throws if no provider advertises the state-satisfies service.
1340
+ */
1341
+ static default(): StateSatisfiesModel
1342
+ /**
1343
+ * Resolve a model alias against the loaded config. Throws if the alias is
1344
+ * unknown.
1345
+ */
1346
+ static byAlias(alias: string): StateSatisfiesModel
1347
+ /** Convenience shim for the bundled `simular_state_satisfies` alias. */
1348
+ static simularStateSatisfies(): StateSatisfiesModel
1349
+ /**
1350
+ * Every state-satisfaction model alias accepted by `byAlias` on this
1351
+ * machine, deduplicated and sorted alphabetically.
1352
+ */
1353
+ static availableAliases(): Array<string>
1354
+ /** The provider-configured model identifier. */
1355
+ get name(): string
1356
+ /** Check that the provider credentials work. Throws if they do not. */
1357
+ checkAuth(): void
1358
+ /**
1359
+ * Evaluate `condition` against the supplied accessibility text and raw
1360
+ * screenshot base64.
1361
+ */
1362
+ stateSatisfies(condition: string, text: string, image: string): boolean
1363
+ }
1364
+
1213
1365
  /** A speech-to-text model that can transcribe audio. */
1214
1366
  export declare class SttModel {
1215
1367
  /**
@@ -1296,6 +1448,24 @@ export declare class Window {
1296
1448
  * - **Linux**: always `null` (no per-point window hit-test).
1297
1449
  */
1298
1450
  static fromPoint(x: number, y: number): Window | null
1451
+ /**
1452
+ * The window that currently has keyboard focus, if any.
1453
+ *
1454
+ * Returns `null` when nothing is focused (e.g. no windows open, focus is on
1455
+ * the desktop, or the frontmost app has no focused window) rather than
1456
+ * treating that as an error. Throwing is reserved for genuine backend
1457
+ * failures.
1458
+ *
1459
+ * Platform notes:
1460
+ * - **Windows**: `GetForegroundWindow` — the top-level window of the
1461
+ * foreground thread. Never throws; `null` only when there is no
1462
+ * foreground window at all.
1463
+ * - **macOS**: the frontmost running application's `AXFocusedWindow`
1464
+ * attribute. `null` when there is no frontmost app, or the frontmost app
1465
+ * has no focused window (e.g. all its windows are minimized).
1466
+ * - **Linux**: EWMH `_NET_ACTIVE_WINDOW` root-window property.
1467
+ */
1468
+ static focused(): Window | null
1299
1469
  /** Window title (may be empty). */
1300
1470
  get title(): string
1301
1471
  /** Process ID that owns this window. */
@@ -1655,6 +1825,37 @@ export declare enum FocusPolicy {
1655
1825
 
1656
1826
  export declare function hasScreenCapturePermission(): boolean
1657
1827
 
1828
+ /** Aspect ratio of the generated image. */
1829
+ export declare enum ImageAspectRatio {
1830
+ /** `1:1` square. Default. */
1831
+ Square = 0,
1832
+ /** `16:9` widescreen landscape. */
1833
+ Landscape16x9 = 1,
1834
+ /** `9:16` tall portrait. */
1835
+ Portrait9x16 = 2,
1836
+ /** `4:3` landscape. */
1837
+ Landscape4x3 = 3,
1838
+ /** `3:4` portrait. */
1839
+ Portrait3x4 = 4,
1840
+ /** `3:2` landscape. */
1841
+ Landscape3x2 = 5,
1842
+ /** `2:3` portrait. */
1843
+ Portrait2x3 = 6,
1844
+ }
1845
+
1846
+ /**
1847
+ * Output image quality / model tier for [`ImageGenModel.generate`].
1848
+ *
1849
+ * The image-gen endpoint selects the underlying model from this value, so it
1850
+ * is a per-request parameter rather than a config-resolved model name.
1851
+ */
1852
+ export declare enum ImageQuality {
1853
+ /** Faster, cheaper tier. Default. */
1854
+ Fast = 0,
1855
+ /** Higher-fidelity tier. */
1856
+ Pro = 1,
1857
+ }
1858
+
1658
1859
  /**
1659
1860
  * Initialize the logger.
1660
1861
  *