webmarker-js 0.0.6 → 0.0.7

Sign up to get free protection for your applications and to get access to all the features.
package/README.md CHANGED
@@ -13,58 +13,143 @@ Mark web pages for use with vision-language models.
13
13
 
14
14
  🚧 Under Construction
15
15
 
16
- **WebMarker** adds visual markings with labels to elements on a web page. This can be used for [Set-of-Mark (SoM)](https://github.com/microsoft/SoM) prompting, which improves visual grounding abilities of vision-language models such as GPT-4V, Claude 3, and Google Gemini 1.5.
16
+ **WebMarker** adds visual markings with labels to elements on a web page. This can be used for [Set-of-Mark (SoM)](https://github.com/microsoft/SoM) prompting, which improves visual grounding abilities of vision-language models such as GPT-4o, Claude 3.5, and Google Gemini 1.5.
17
17
 
18
- ## Usage
18
+ ## How it works
19
19
 
20
- The `mark()` function will add markings for all interactive elements on a web page, and return an object containing the marked elements. The returned objects's keys are the mark labels, and the values are an object with the element (`element`), mark element (`markElement`), and mask element (`maskElement`).
20
+ **1. Call the `mark()` function**
21
+
22
+ This marks the interactive elements on the page, and returns an object containing the marked elements, where each key is a mark label string, and each value is an object with the following properties:
23
+
24
+ - `element`: The interactive element that was marked.
25
+ - `markElement`: The label element that was added to the page.
26
+ - `maskElement`: The bounding box element that was added to the page.
27
+
28
+ You can use this information to build your prompt for the vision-language model.
29
+
30
+ **2. Send a screenshot of the marked page to a vision-language model, along with your prompt**
31
+
32
+ Example prompt:
21
33
 
22
34
  ```javascript
23
- import { mark, unmark } from "webmarker-js";
35
+ let markedElements = await mark();
36
+
37
+ let prompt = `The following is a screenshot of a web page.
38
+
39
+ Interactive elements have been marked with red bounding boxes and labels.
24
40
 
25
- // Mark interactive elements on the document
26
- let elements = await mark();
41
+ When referring to elements, use the labels to identify them.
27
42
 
28
- // Reference an element by label
29
- console.log(elements["0"].element);
43
+ Return an action and element to perform the action on.
30
44
 
31
- // Remove markings
32
- unmark();
45
+ Available actions: click, hover
46
+
47
+ Available elements:
48
+ ${Object.keys(markedElements)
49
+ .map((label) => `- ${label}`)
50
+ .join("\n")}
51
+
52
+ Example response: click 0
53
+ `;
54
+ ```
55
+
56
+ **3. Programmatically interact with the marked elements.**
57
+
58
+ In a web browser (i.e. via Playwright), interact with elements as needed.
59
+
60
+ For prompting or agent ideas, see the [WebVoyager](https://arxiv.org/abs/2401.13919) paper.
61
+
62
+ ## Playwright example
63
+
64
+ ```javascript
65
+ // Inject the WebMarker library into the page
66
+ await page.addScriptTag({
67
+ url: "https://cdn.jsdelivr.net/npm/webmarker-js/dist/main.js",
68
+ });
69
+
70
+ // Mark the page and get the marked elements
71
+ let markedElements = await page.evaluate(async () => await WebMarker.mark());
72
+
73
+ // (Optional) Check if page is marked
74
+ let isMarked = await page.evaluate(async () => await WebMarker.isMarked());
75
+
76
+ // (Optional) Unmark the page
77
+ await page.evaluate(async () => await WebMarker.unmark());
33
78
  ```
34
79
 
35
80
  ## Options
36
81
 
37
- - **selector**: A custom CSS selector to specify which elements to mark.
38
- - Type: `string`
39
- - Default: `"button, input, a, select, textarea"`
40
- - **markStyle**: A CSS style to apply to the label element. You can also specify a function that returns a CSS style object.
41
- - Type: `Readonly<Partial<CSSStyleDeclaration>> or (element: Element) => Readonly<Partial<CSSStyleDeclaration>>`
42
- - Default: `{backgroundColor: "red", color: "white", padding: "2px 4px", fontSize: "12px", fontWeight: "bold"}`
43
- - **markPlacement**: The placement of the mark relative to the element.
44
- - Type: `'top' | 'top-start' | 'top-end' | 'right' | 'right-start' | 'right-end' | 'bottom' | 'bottom-start' | 'bottom-end' | 'left' | 'left-start' | 'left-end'`
45
- - Default: `'top-start'`
46
- - **maskStyle**: A CSS style to apply to the bounding box element. You can also specify a function that returns a CSS style object. Bounding boxes are only shown if showMasks is true.
47
- - Type: `Readonly<Partial<CSSStyleDeclaration>> or (element: Element) => Readonly<Partial<CSSStyleDeclaration>>`
48
- - Default: `{outline: "2px dashed red", backgroundColor: "transparent"}`
49
- - **showMasks**: Whether or not to show bounding boxes around the elements.
50
- - Type: `boolean`
51
- - Default: `true`
52
- - **labelGenerator**: Provide a function for generating labels. By default, labels are generated as numbers starting from 0.
53
- - Type: `(element: Element, index: number) => string`
54
- - Default: `(_, index) => index.toString()`
55
- - **containerElement**: Provide a container element to query the elements to be marked. By default, the container element is document.body.
56
- - Type: `Element`
57
- - Default: `document.body`
58
- - **viewPortOnly**: Only mark elements that are visible in the current viewport.
59
- - Type: `boolean`
60
- - Default: `false`
82
+ ### selector
83
+
84
+ A custom CSS selector to specify which elements to mark.
85
+
86
+ - Type: `string`
87
+ - Default: `"button, input, a, select, textarea"`
88
+
89
+ ### markAttribute
90
+
91
+ A custom attribute to add to the marked elements. This attribute contains the label of the mark.
92
+
93
+ - Type: `string`
94
+ - Default: `"data-mark-id"`
95
+
96
+ ### markStyle
97
+
98
+ A CSS style to apply to the label element. You can also specify a function that returns a CSS style object.
99
+
100
+ - Type: `Readonly<Partial<CSSStyleDeclaration>> or (element: Element) => Readonly<Partial<CSSStyleDeclaration>>`
101
+ - Default: `{backgroundColor: "red", color: "white", padding: "2px 4px", fontSize: "12px", fontWeight: "bold"}`
102
+
103
+ ### markPlacement
104
+
105
+ The placement of the mark relative to the element.
106
+
107
+ - Type: `'top' | 'top-start' | 'top-end' | 'right' | 'right-start' | 'right-end' | 'bottom' | 'bottom-start' | 'bottom-end' | 'left' | 'left-start' | 'left-end'`
108
+ - Default: `'top-start'`
109
+
110
+ ### maskStyle
111
+
112
+ A CSS style to apply to the bounding box element. You can also specify a function that returns a CSS style object. Bounding boxes are only shown if showMasks is true.
113
+
114
+ - Type: `Readonly<Partial<CSSStyleDeclaration>> or (element: Element) => Readonly<Partial<CSSStyleDeclaration>>`
115
+ - Default: `{outline: "2px dashed red", backgroundColor: "transparent"}`
116
+
117
+ ### showMasks
118
+
119
+ Whether or not to show bounding boxes around the elements.
120
+
121
+ - Type: `boolean`
122
+ - Default: `true`
123
+
124
+ ### labelGenerator
125
+
126
+ Provide a function for generating labels. By default, labels are generated as integers starting from 0.
127
+
128
+ - Type: `(element: Element, index: number) => string`
129
+ - Default: `(_, index) => index.toString()`
130
+
131
+ ### containerElement
132
+
133
+ Provide a container element to query the elements to be marked. By default, the container element is document.body.
134
+
135
+ - Type: `Element`
136
+ - Default: `document.body`
137
+
138
+ ### viewPortOnly
139
+
140
+ Only mark elements that are visible in the current viewport.
141
+
142
+ - Type: `boolean`
143
+ - Default: `false`
61
144
 
62
145
  ### Advanced example
63
146
 
64
147
  ```typescript
65
- let elements = mark({
148
+ const markedElements = await mark({
66
149
  // Only mark buttons and inputs
67
150
  selector: "button, input",
151
+ // Use test id attribute for marker labels
152
+ markAttribute: "data-test-id",
68
153
  // Use a blue mark with white text
69
154
  markStyle: { color: "white", backgroundColor: "blue", padding: 5 },
70
155
  // Use a blue dashed outline mask with a transparent and slighly blue background
@@ -82,11 +167,4 @@ let elements = mark({
82
167
  // Only mark elements that are visible in the current viewport
83
168
  viewPortOnly: true,
84
169
  });
85
-
86
- // Cleanup
87
- unmark();
88
170
  ```
89
-
90
- ## Use with Playwright
91
-
92
- Coming soon
package/dist/index.d.ts CHANGED
@@ -4,6 +4,12 @@ interface MarkOptions {
4
4
  * A CSS selector to specify the elements to be marked.
5
5
  */
6
6
  selector?: string;
7
+ /**
8
+ * Name for the attribute added to the marked elements. This attribute is used to store the label.
9
+ *
10
+ * @default 'data-mark-id'
11
+ */
12
+ markAttribute?: string;
7
13
  /**
8
14
  * A CSS style to apply to the label element.
9
15
  * You can also specify a function that returns a CSS style object.
package/dist/main.js CHANGED
@@ -932,6 +932,7 @@ var WebMarker = (() => {
932
932
  return __async(this, arguments, function* (options = {}) {
933
933
  const {
934
934
  selector = "button, input, a, select, textarea",
935
+ markAttribute = "data-mark-id",
935
936
  markStyle = {
936
937
  backgroundColor: "red",
937
938
  color: "white",
@@ -961,7 +962,7 @@ var WebMarker = (() => {
961
962
  const markElement = createMark(element, markStyle, label, markPlacement);
962
963
  const maskElement = showMasks ? createMask(element, maskStyle, label) : void 0;
963
964
  markedElements[label] = { element, markElement, maskElement };
964
- element.setAttribute("data-webmarkeredby", `webmarker-${label}`);
965
+ element.setAttribute(markAttribute, label);
965
966
  }))
966
967
  );
967
968
  document.documentElement.dataset.webmarkered = "true";
@@ -988,8 +989,8 @@ var WebMarker = (() => {
988
989
  }
989
990
  function createMask(element, style, label) {
990
991
  const maskElement = document.createElement("div");
991
- maskElement.className = "webmarkermask";
992
- maskElement.id = `webmarkermask-${label}`;
992
+ maskElement.className = "webmarker-mask";
993
+ maskElement.id = `webmarker-mask-${label}`;
993
994
  document.body.appendChild(maskElement);
994
995
  positionMask(maskElement, element);
995
996
  applyStyle(
@@ -1040,7 +1041,7 @@ var WebMarker = (() => {
1040
1041
  Object.assign(element.style, defaultStyle, customStyle);
1041
1042
  }
1042
1043
  function unmark() {
1043
- document.querySelectorAll(".webmarker, .webmarkermask").forEach((el) => el.remove());
1044
+ document.querySelectorAll(".webmarker, .webmarker-mask").forEach((el) => el.remove());
1044
1045
  document.documentElement.removeAttribute("data-webmarkered");
1045
1046
  cleanupFns.forEach((fn) => fn());
1046
1047
  cleanupFns = [];
package/dist/module.js CHANGED
@@ -908,6 +908,7 @@ function mark() {
908
908
  return __async(this, arguments, function* (options = {}) {
909
909
  const {
910
910
  selector = "button, input, a, select, textarea",
911
+ markAttribute = "data-mark-id",
911
912
  markStyle = {
912
913
  backgroundColor: "red",
913
914
  color: "white",
@@ -937,7 +938,7 @@ function mark() {
937
938
  const markElement = createMark(element, markStyle, label, markPlacement);
938
939
  const maskElement = showMasks ? createMask(element, maskStyle, label) : void 0;
939
940
  markedElements[label] = { element, markElement, maskElement };
940
- element.setAttribute("data-webmarkeredby", `webmarker-${label}`);
941
+ element.setAttribute(markAttribute, label);
941
942
  }))
942
943
  );
943
944
  document.documentElement.dataset.webmarkered = "true";
@@ -964,8 +965,8 @@ function createMark(element, style, label, markPlacement = "top-start") {
964
965
  }
965
966
  function createMask(element, style, label) {
966
967
  const maskElement = document.createElement("div");
967
- maskElement.className = "webmarkermask";
968
- maskElement.id = `webmarkermask-${label}`;
968
+ maskElement.className = "webmarker-mask";
969
+ maskElement.id = `webmarker-mask-${label}`;
969
970
  document.body.appendChild(maskElement);
970
971
  positionMask(maskElement, element);
971
972
  applyStyle(
@@ -1016,7 +1017,7 @@ function applyStyle(element, defaultStyle, customStyle) {
1016
1017
  Object.assign(element.style, defaultStyle, customStyle);
1017
1018
  }
1018
1019
  function unmark() {
1019
- document.querySelectorAll(".webmarker, .webmarkermask").forEach((el) => el.remove());
1020
+ document.querySelectorAll(".webmarker, .webmarker-mask").forEach((el) => el.remove());
1020
1021
  document.documentElement.removeAttribute("data-webmarkered");
1021
1022
  cleanupFns.forEach((fn) => fn());
1022
1023
  cleanupFns = [];
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "webmarker-js",
3
- "version": "0.0.6",
3
+ "version": "0.0.7",
4
4
  "description": "A library for marking web pages for Set-of-Mark (SoM) prompting with vision-language models.",
5
5
  "source": "src/index.ts",
6
6
  "main": "dist/main.js",
package/src/index.ts CHANGED
@@ -19,6 +19,12 @@ interface MarkOptions {
19
19
  * A CSS selector to specify the elements to be marked.
20
20
  */
21
21
  selector?: string;
22
+ /**
23
+ * Name for the attribute added to the marked elements. This attribute is used to store the label.
24
+ *
25
+ * @default 'data-mark-id'
26
+ */
27
+ markAttribute?: string;
22
28
  /**
23
29
  * A CSS style to apply to the label element.
24
30
  * You can also specify a function that returns a CSS style object.
@@ -77,6 +83,7 @@ async function mark(
77
83
  ): Promise<Record<string, MarkedElement>> {
78
84
  const {
79
85
  selector = "button, input, a, select, textarea",
86
+ markAttribute = "data-mark-id",
80
87
  markStyle = {
81
88
  backgroundColor: "red",
82
89
  color: "white",
@@ -113,7 +120,7 @@ async function mark(
113
120
  : undefined;
114
121
 
115
122
  markedElements[label] = { element, markElement, maskElement };
116
- element.setAttribute("data-webmarkeredby", `webmarker-${label}`);
123
+ element.setAttribute(markAttribute, label);
117
124
  })
118
125
  );
119
126
 
@@ -155,8 +162,8 @@ function createMask(
155
162
  label: string
156
163
  ): HTMLElement {
157
164
  const maskElement = document.createElement("div");
158
- maskElement.className = "webmarkermask";
159
- maskElement.id = `webmarkermask-${label}`;
165
+ maskElement.className = "webmarker-mask";
166
+ maskElement.id = `webmarker-mask-${label}`;
160
167
  document.body.appendChild(maskElement);
161
168
  positionMask(maskElement, element);
162
169
  applyStyle(
@@ -215,7 +222,7 @@ function applyStyle(
215
222
 
216
223
  function unmark(): void {
217
224
  document
218
- .querySelectorAll(".webmarker, .webmarkermask")
225
+ .querySelectorAll(".webmarker, .webmarker-mask")
219
226
  .forEach((el) => el.remove());
220
227
  document.documentElement.removeAttribute("data-webmarkered");
221
228
  cleanupFns.forEach((fn) => fn());