@geometra/mcp 1.18.1 → 1.19.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -1,6 +1,8 @@
1
1
  # @geometra/mcp
2
2
 
3
- MCP server for [Geometra](https://github.com/razroo/geometra) — interact with running Geometra apps via the geometry protocol over WebSocket. For **native** Geometra apps there is no browser in the loop. For **any existing website**, pair this MCP server with [`@geometra/proxy`](../packages/proxy/README.md) (headless Chromium) so the same tools speak the same GEOM v1 wire format.
3
+ MCP server for [Geometra](https://github.com/razroo/geometra) — interact with running Geometra apps via the geometry protocol over WebSocket. For **native** Geometra apps there is no browser in the loop. For **any existing website**, use **`geometra_connect` with `pageUrl`** — the MCP server starts [`@geometra/proxy`](../packages/proxy/README.md) for you (bundled dependency) so you do not need a separate terminal or a `ws://` URL. You can still pass `url: "ws://…"` if a proxy is already running.
4
+
5
+ See [`AGENT_MODEL.md`](./AGENT_MODEL.md) for the MCP mental model, why token usage can be lower than large browser snapshots, and how headed vs headless proxy mode works.
4
6
 
5
7
  ## What this does
6
8
 
@@ -9,16 +11,17 @@ Connects Claude Code, Codex, or any MCP-compatible AI agent to a WebSocket endpo
9
11
  ```
10
12
  Playwright + vision: screenshot → model → guess coordinates → click → repeat
11
13
  Native Geometra: WebSocket → JSON geometry (no browser on the agent path)
12
- Geometra proxy: Headless Chromium → DOM geometry → same WebSocket as native → MCP tools unchanged
14
+ Geometra proxy: Chromium → DOM geometry → same WebSocket as native → MCP tools unchanged (often started via `pageUrl`, no manual CLI)
13
15
  ```
14
16
 
15
17
  ## Tools
16
18
 
17
19
  | Tool | Description |
18
20
  |---|---|
19
- | `geometra_connect` | Connect to a running Geometra server |
20
- | `geometra_query` | Find elements by role, name, or text content |
21
- | `geometra_page_model` | Higher-level webpage model: landmarks, forms, dialogs, lists, short previews |
21
+ | `geometra_connect` | Connect with `url` (ws://…) **or** `pageUrl` (https://…) to auto-start geometra-proxy |
22
+ | `geometra_query` | Find elements by stable id, role, name, or text content |
23
+ | `geometra_page_model` | Summary-first webpage model: archetypes, stable section ids, counts, top-level sections, primary actions |
24
+ | `geometra_expand_section` | Expand one form/dialog/list/landmark from `geometra_page_model` on demand |
22
25
  | `geometra_click` | Click an element by coordinates |
23
26
  | `geometra_type` | Type text into the focused element |
24
27
  | `geometra_key` | Send special keys (Enter, Tab, Escape, arrows) |
@@ -87,6 +90,8 @@ npx geometra-proxy http://localhost:8080 --port 3200
87
90
  # Requires Chromium: npx playwright install chromium
88
91
  ```
89
92
 
93
+ `geometra-proxy` opens a **visible Chromium window by default**. For servers or CI, pass **`--headless`** or set **`GEOMETRA_HEADLESS=1`**. Optional **`--slow-mo <ms>`** slows Playwright actions so they are easier to watch. Headed vs headless usually does **not** materially change token usage, since token usage is driven by MCP tool output rather than whether Chromium is visible.
94
+
90
95
  Point MCP at `ws://127.0.0.1:3200` instead of a native Geometra server. The proxy translates clicks and keyboard messages into Playwright actions and streams updated geometry.
91
96
 
92
97
  Then in Claude Code (either backend):
@@ -94,7 +99,7 @@ Then in Claude Code (either backend):
94
99
  ```
95
100
  > Connect to my Geometra app at ws://localhost:3100 and tell me what's on screen
96
101
 
97
- > Give me the page model first, then find the main form
102
+ > Give me the page model first, then expand the main form
98
103
 
99
104
  > Click the "Submit" button
100
105
 
@@ -136,7 +141,10 @@ Agent: geometra_connect({ url: "ws://127.0.0.1:3200" })
136
141
  → Connected. UI includes textbox "Email", button "Save", …
137
142
 
138
143
  Agent: geometra_page_model({})
139
- → {"viewport":{"width":1024,"height":768},"landmarks":[...],"forms":[...],"dialogs":[],"lists":[]}
144
+ → {"viewport":{"width":1024,"height":768},"archetypes":["shell","form"],"summary":{...},"forms":[{"id":"fm:1.0","fieldCount":3,"actionCount":1}], ...}
145
+
146
+ Agent: geometra_expand_section({ id: "fm:1.0" })
147
+ → {"id":"fm:1.0","kind":"form","fields":[{"id":"n:1.0.0","name":"Email"}, ...], "actions":[...]}
140
148
 
141
149
  Agent: geometra_query({ role: "textbox", name: "Email" })
142
150
  → bounds for the email field (viewport coordinates)
@@ -156,9 +164,10 @@ Agent: geometra_query({ role: "button", name: "Save" })
156
164
  2. It receives the computed layout (`{ x, y, width, height }` for every node) and the UI tree (`kind`, `semantic`, `props`, `handlers`, `children`).
157
165
  3. It builds an accessibility tree from that data — roles, names, focusable state, bounds.
158
166
  4. **`geometra_snapshot`** defaults to a **compact** flat list of viewport-visible actionable nodes (minified JSON) to reduce LLM tokens; use `view: "full"` for the complete nested tree.
159
- 5. **`geometra_page_model`** extracts higher-level webpage structure (landmarks, forms, dialogs, lists) so agents can reason about normal HTML pages without pulling a full tree first.
160
- 6. After interactions, action tools return a **semantic delta** when possible (dialogs opened/closed, forms appeared/removed, list counts changed, named/focusable nodes added/removed/updated). If nothing meaningful changed, they fall back to a short current-UI overview.
161
- 7. Tools expose query, click, type, snapshot, and page-model operations over this structured data.
162
- 8. After each interaction, the peer sends updated geometry (full `frame` or `patch`) the MCP tools interpret that into compact summaries.
167
+ 5. **`geometra_page_model`** is summary-first: page archetypes, stable section ids, counts, top-level landmarks/forms/dialogs/lists, and a few primary actions. It is designed to be cheaper than dumping full previews for every section.
168
+ 6. **`geometra_expand_section`** fetches richer details only for the section you care about (fields, actions, headings, nested lists, list items, text preview).
169
+ 7. After interactions, action tools return a **semantic delta** when possible (dialogs opened/closed, forms appeared/removed, list counts changed, named/focusable nodes added/removed/updated). If nothing meaningful changed, they fall back to a short current-UI overview.
170
+ 8. Tools expose query, click, type, snapshot, page-model, and section-expansion operations over this structured data.
171
+ 9. After each interaction, the peer sends updated geometry (full `frame` or `patch`) — the MCP tools interpret that into compact summaries.
163
172
 
164
173
  With a **native** Geometra server, layout comes from Textura/Yoga. With **`@geometra/proxy`**, layout comes from the browser’s computed DOM geometry; the MCP layer is the same.
@@ -1,5 +1,5 @@
1
1
  import { describe, expect, it } from 'vitest';
2
- import { buildPageModel, buildUiDelta, hasUiDelta, summarizeUiDelta, } from '../session.js';
2
+ import { buildPageModel, expandPageSection, buildUiDelta, hasUiDelta, summarizeUiDelta, } from '../session.js';
3
3
  function node(role, name, bounds, options) {
4
4
  return {
5
5
  role,
@@ -12,7 +12,7 @@ function node(role, name, bounds, options) {
12
12
  };
13
13
  }
14
14
  describe('buildPageModel', () => {
15
- it('extracts landmarks, forms, and lists from a typical webpage tree', () => {
15
+ it('builds a summary-first page model with stable section ids', () => {
16
16
  const tree = node('group', undefined, { x: 0, y: 0, width: 1024, height: 768 }, {
17
17
  children: [
18
18
  node('navigation', 'Primary nav', { x: 0, y: 0, width: 220, height: 80 }, { path: [0] }),
@@ -43,20 +43,82 @@ describe('buildPageModel', () => {
43
43
  });
44
44
  const model = buildPageModel(tree);
45
45
  expect(model.viewport).toEqual({ width: 1024, height: 768 });
46
- expect(model.landmarks.map(item => item.role)).toEqual(['navigation', 'main', 'form']);
46
+ expect(model.archetypes).toEqual(expect.arrayContaining(['shell', 'form', 'results']));
47
+ expect(model.summary).toEqual({
48
+ landmarkCount: 3,
49
+ formCount: 1,
50
+ dialogCount: 0,
51
+ listCount: 1,
52
+ focusableCount: 1,
53
+ });
54
+ expect(model.landmarks.map(item => item.id)).toEqual(['lm:0', 'lm:1', 'lm:1.0']);
47
55
  expect(model.forms).toHaveLength(1);
48
56
  expect(model.forms[0]).toMatchObject({
57
+ id: 'fm:1.0',
49
58
  name: 'Job application',
50
59
  fieldCount: 2,
51
60
  actionCount: 1,
52
61
  });
53
- expect(model.forms[0]?.fields.map(field => field.name)).toEqual(['Full name', 'Email']);
54
- expect(model.forms[0]?.actions.map(action => action.name)).toEqual(['Submit application']);
55
62
  expect(model.lists[0]).toMatchObject({
63
+ id: 'ls:1.1',
56
64
  name: 'Open roles',
57
65
  itemCount: 2,
58
- itemsPreview: ['Designer', 'Engineer'],
59
66
  });
67
+ expect(model.primaryActions).toEqual([
68
+ expect.objectContaining({
69
+ id: 'n:1.0.2',
70
+ role: 'button',
71
+ name: 'Submit application',
72
+ }),
73
+ ]);
74
+ });
75
+ it('expands a section by id on demand', () => {
76
+ const tree = node('group', undefined, { x: 0, y: 0, width: 1024, height: 768 }, {
77
+ children: [
78
+ node('main', undefined, { x: 0, y: 0, width: 1024, height: 768 }, {
79
+ path: [0],
80
+ children: [
81
+ node('form', 'Job application', { x: 40, y: 120, width: 520, height: 280 }, {
82
+ path: [0, 0],
83
+ children: [
84
+ node('heading', 'Application', { x: 60, y: 132, width: 200, height: 24 }, { path: [0, 0, 0] }),
85
+ node('textbox', 'Full name*', { x: 60, y: 160, width: 300, height: 36 }, { path: [0, 0, 1] }),
86
+ node('textbox', 'Email:', { x: 60, y: 208, width: 300, height: 36 }, { path: [0, 0, 2] }),
87
+ node('button', 'Submit application', { x: 60, y: 264, width: 180, height: 40 }, {
88
+ path: [0, 0, 3],
89
+ focusable: true,
90
+ }),
91
+ ],
92
+ }),
93
+ ],
94
+ }),
95
+ ],
96
+ });
97
+ const detail = expandPageSection(tree, 'fm:0.0');
98
+ expect(detail).toMatchObject({
99
+ id: 'fm:0.0',
100
+ kind: 'form',
101
+ role: 'form',
102
+ name: 'Application',
103
+ summary: {
104
+ headingCount: 1,
105
+ fieldCount: 2,
106
+ actionCount: 1,
107
+ },
108
+ });
109
+ expect(detail?.fields.map(field => field.name)).toEqual(['Full name', 'Email']);
110
+ expect(detail?.actions.map(action => action.id)).toEqual(['n:0.0.3']);
111
+ expect(detail?.fields[0]).not.toHaveProperty('bounds');
112
+ });
113
+ it('drops noisy container names and falls back to unnamed summaries', () => {
114
+ const tree = node('group', undefined, { x: 0, y: 0, width: 800, height: 600 }, {
115
+ children: [
116
+ node('form', 'First Name* Last Name* Email* Phone* Country* Location* Resume* LinkedIn*', { x: 20, y: 20, width: 500, height: 400 }, { path: [0] }),
117
+ ],
118
+ });
119
+ const model = buildPageModel(tree);
120
+ expect(model.forms[0]?.id).toBe('fm:0');
121
+ expect(model.forms[0]?.name).toBeUndefined();
60
122
  });
61
123
  });
62
124
  describe('buildUiDelta', () => {
@@ -115,14 +177,15 @@ describe('buildUiDelta', () => {
115
177
  const delta = buildUiDelta(before, after);
116
178
  expect(hasUiDelta(delta)).toBe(true);
117
179
  expect(delta.dialogsOpened).toHaveLength(1);
180
+ expect(delta.dialogsOpened[0]?.id).toBe('dg:0.2');
118
181
  expect(delta.dialogsOpened[0]?.name).toBe('Save complete');
119
182
  expect(delta.listCountsChanged).toEqual([
120
- { name: 'Results', path: [0, 1], beforeCount: 2, afterCount: 3 },
183
+ { id: 'ls:0.1', name: 'Results', beforeCount: 2, afterCount: 3 },
121
184
  ]);
122
185
  expect(delta.updated.some(update => update.after.name === 'Save' && update.changes.some(change => change.includes('disabled')))).toBe(true);
123
186
  const summary = summarizeUiDelta(delta);
124
- expect(summary).toContain('+ dialog "Save complete" opened');
125
- expect(summary).toContain('~ list "Results" items 2 -> 3');
126
- expect(summary).toContain('~ button "Save": disabled unset -> true');
187
+ expect(summary).toContain('+ dg:0.2 dialog "Save complete" opened');
188
+ expect(summary).toContain('~ ls:0.1 list "Results" items 2 -> 3');
189
+ expect(summary).toContain('~ n:0.0 button "Save": disabled unset -> true');
127
190
  });
128
191
  });
@@ -0,0 +1,20 @@
1
+ import { type ChildProcess } from 'node:child_process';
2
+ /** Resolve bundled @geometra/proxy CLI entry (dist/index.js). */
3
+ export declare function resolveProxyScriptPath(): string;
4
+ /** Prefer `preferred` when free; otherwise an ephemeral port on 127.0.0.1. */
5
+ export declare function pickFreePort(preferred?: number): Promise<number>;
6
+ export interface SpawnProxyParams {
7
+ pageUrl: string;
8
+ port: number;
9
+ headless?: boolean;
10
+ width?: number;
11
+ height?: number;
12
+ slowMo?: number;
13
+ }
14
+ /**
15
+ * Spawn geometra-proxy as a child process and resolve when the WebSocket is listening.
16
+ */
17
+ export declare function spawnGeometraProxy(opts: SpawnProxyParams): Promise<{
18
+ child: ChildProcess;
19
+ wsUrl: string;
20
+ }>;
@@ -0,0 +1,110 @@
1
+ import { spawn } from 'node:child_process';
2
+ import { createRequire } from 'node:module';
3
+ import { createServer } from 'node:net';
4
+ import path from 'node:path';
5
+ const require = createRequire(import.meta.url);
6
+ /** Resolve bundled @geometra/proxy CLI entry (dist/index.js). */
7
+ export function resolveProxyScriptPath() {
8
+ try {
9
+ const pkgJson = require.resolve('@geometra/proxy/package.json');
10
+ return path.join(path.dirname(pkgJson), 'dist/index.js');
11
+ }
12
+ catch {
13
+ throw new Error('Could not resolve @geometra/proxy. Install it with the MCP package: npm install @geometra/proxy');
14
+ }
15
+ }
16
+ function canBindPort(p) {
17
+ return new Promise(resolve => {
18
+ const s = createServer();
19
+ s.once('error', () => resolve(false));
20
+ s.listen(p, '127.0.0.1', () => {
21
+ s.close(() => resolve(true));
22
+ });
23
+ });
24
+ }
25
+ function getEphemeralPort() {
26
+ return new Promise((resolve, reject) => {
27
+ const s = createServer();
28
+ s.once('error', reject);
29
+ s.listen(0, '127.0.0.1', () => {
30
+ const a = s.address();
31
+ s.close(err => {
32
+ if (err) {
33
+ reject(err);
34
+ return;
35
+ }
36
+ if (typeof a === 'object' && a !== null && 'port' in a)
37
+ resolve(a.port);
38
+ else
39
+ reject(new Error('Could not allocate ephemeral port'));
40
+ });
41
+ });
42
+ });
43
+ }
44
+ /** Prefer `preferred` when free; otherwise an ephemeral port on 127.0.0.1. */
45
+ export async function pickFreePort(preferred) {
46
+ if (preferred != null && preferred > 0 && preferred <= 65535) {
47
+ if (await canBindPort(preferred))
48
+ return preferred;
49
+ }
50
+ return getEphemeralPort();
51
+ }
52
+ const LISTEN_RE = /WebSocket listening on (ws:\/\/127\.0\.0\.1:\d+)/;
53
+ /**
54
+ * Spawn geometra-proxy as a child process and resolve when the WebSocket is listening.
55
+ */
56
+ export function spawnGeometraProxy(opts) {
57
+ const script = resolveProxyScriptPath();
58
+ const args = [script, opts.pageUrl, '--port', String(opts.port)];
59
+ if (opts.width != null && opts.width > 0)
60
+ args.push('--width', String(opts.width));
61
+ if (opts.height != null && opts.height > 0)
62
+ args.push('--height', String(opts.height));
63
+ if (opts.slowMo != null && opts.slowMo > 0)
64
+ args.push('--slow-mo', String(opts.slowMo));
65
+ if (opts.headless === true)
66
+ args.push('--headless');
67
+ else if (opts.headless === false)
68
+ args.push('--headed');
69
+ return new Promise((resolve, reject) => {
70
+ const child = spawn(process.execPath, args, {
71
+ stdio: ['ignore', 'pipe', 'pipe'],
72
+ env: { ...process.env },
73
+ });
74
+ let settled = false;
75
+ let stderrBuf = '';
76
+ const deadline = setTimeout(() => {
77
+ if (!settled) {
78
+ settled = true;
79
+ child.kill('SIGTERM');
80
+ reject(new Error('geometra-proxy did not report a listening WebSocket within 45s'));
81
+ }
82
+ }, 45_000);
83
+ const flushStderr = (chunk) => {
84
+ stderrBuf += chunk.toString();
85
+ const m = stderrBuf.match(LISTEN_RE);
86
+ if (m && !settled) {
87
+ settled = true;
88
+ clearTimeout(deadline);
89
+ child.stderr?.removeAllListeners('data');
90
+ resolve({ child, wsUrl: m[1] });
91
+ }
92
+ };
93
+ child.stderr?.on('data', flushStderr);
94
+ child.on('error', err => {
95
+ if (!settled) {
96
+ settled = true;
97
+ clearTimeout(deadline);
98
+ reject(err);
99
+ }
100
+ });
101
+ child.on('exit', (code, sig) => {
102
+ if (!settled) {
103
+ settled = true;
104
+ clearTimeout(deadline);
105
+ const tail = stderrBuf.trim().slice(-2000);
106
+ reject(new Error(`geometra-proxy exited before ready (code=${code} signal=${sig}). Stderr (tail): ${tail || '(empty)'}`));
107
+ }
108
+ });
109
+ });
110
+ }
package/dist/server.js CHANGED
@@ -1,78 +1,148 @@
1
1
  import { McpServer } from '@modelcontextprotocol/sdk/server/mcp.js';
2
2
  import { z } from 'zod';
3
- import { connect, disconnect, getSession, sendClick, sendType, sendKey, sendFileUpload, sendListboxPick, sendSelectOption, sendWheel, buildA11yTree, buildCompactUiIndex, buildPageModel, buildUiDelta, hasUiDelta, summarizeCompactIndex, summarizePageModel, summarizeUiDelta, } from './session.js';
3
+ import { connect, connectThroughProxy, disconnect, getSession, sendClick, sendType, sendKey, sendFileUpload, sendListboxPick, sendSelectOption, sendWheel, buildA11yTree, buildCompactUiIndex, buildPageModel, expandPageSection, buildUiDelta, hasUiDelta, nodeIdForPath, summarizeCompactIndex, summarizePageModel, summarizeUiDelta, } from './session.js';
4
4
  export function createServer() {
5
- const server = new McpServer({ name: 'geometra', version: '0.1.0' }, { capabilities: { tools: {} } });
5
+ const server = new McpServer({ name: 'geometra', version: '1.19.0' }, { capabilities: { tools: {} } });
6
6
  // ── connect ──────────────────────────────────────────────────
7
- server.tool('geometra_connect', `Connect to a running Geometra server over WebSocket. This replaces Playwright/browser automation you get direct access to the UI's pixel-exact geometry as JSON.
7
+ server.tool('geometra_connect', `Connect to a Geometra WebSocket peer, or start \`geometra-proxy\` automatically for a normal web page.
8
8
 
9
- Call this first before using any other geometra tools. The peer must be listening (native Geometra server, or \`geometra-proxy\` for real web pages). File upload / wheel / native \`<select>\` require \`@geometra/proxy\`; native Textura servers return an error for those messages.`, {
10
- url: z.string().describe('WebSocket URL of the Geometra server (e.g. ws://localhost:3100)'),
11
- }, async ({ url }) => {
9
+ **Prefer \`pageUrl\` for job sites and SPAs:** pass \`https://…\` and this server spawns geometra-proxy, picks a free port, and connects you do **not** need a separate terminal or a \`ws://\` URL (fewer IDE approval steps for the human).
10
+
11
+ Use \`url\` (ws://…) only when a Geometra/native server or an already-running proxy is listening.
12
+
13
+ Chromium opens **visible** by default unless \`headless: true\`. File upload / wheel / native \`<select>\` need the proxy path (\`pageUrl\` or ws to proxy).`, {
14
+ url: z
15
+ .string()
16
+ .optional()
17
+ .describe('WebSocket URL when a server is already running (e.g. ws://127.0.0.1:3200 or ws://localhost:3100). Omit if using pageUrl.'),
18
+ pageUrl: z
19
+ .string()
20
+ .url()
21
+ .optional()
22
+ .describe('HTTP(S) page to open. MCP starts geometra-proxy and connects automatically. Use this instead of url for most web apply flows.'),
23
+ port: z
24
+ .number()
25
+ .int()
26
+ .positive()
27
+ .max(65535)
28
+ .optional()
29
+ .describe('Local port for spawned proxy (default: ephemeral free port).'),
30
+ headless: z
31
+ .boolean()
32
+ .optional()
33
+ .describe('Run Chromium headless (default false = visible window).'),
34
+ width: z.number().int().positive().optional().describe('Viewport width for spawned proxy.'),
35
+ height: z.number().int().positive().optional().describe('Viewport height for spawned proxy.'),
36
+ slowMo: z
37
+ .number()
38
+ .int()
39
+ .nonnegative()
40
+ .optional()
41
+ .describe('Playwright slowMo (ms) on spawned proxy for easier visual following.'),
42
+ }, async (input) => {
12
43
  try {
13
- const session = await connect(url);
44
+ const hasUrl = typeof input.url === 'string' && input.url.length > 0;
45
+ const hasPage = typeof input.pageUrl === 'string' && input.pageUrl.length > 0;
46
+ if (hasUrl === hasPage) {
47
+ return err('Provide exactly one of: url (WebSocket) or pageUrl (https://…).');
48
+ }
49
+ if (hasPage) {
50
+ const session = await connectThroughProxy({
51
+ pageUrl: input.pageUrl,
52
+ port: input.port,
53
+ headless: input.headless,
54
+ width: input.width,
55
+ height: input.height,
56
+ slowMo: input.slowMo,
57
+ });
58
+ const summary = compactSessionSummary(session);
59
+ return ok(`Started geometra-proxy and connected at ${session.url} (page: ${input.pageUrl}). UI state:\n${summary}`);
60
+ }
61
+ const session = await connect(input.url);
14
62
  const summary = compactSessionSummary(session);
15
- return ok(`Connected to ${url}. UI state:\n${summary}`);
63
+ return ok(`Connected to ${input.url}. UI state:\n${summary}`);
16
64
  }
17
65
  catch (e) {
18
66
  return err(`Failed to connect: ${e.message}`);
19
67
  }
20
68
  });
21
69
  // ── query ────────────────────────────────────────────────────
22
- server.tool('geometra_query', `Find elements in the current Geometra UI by role, name, or text content. Returns matching elements with their exact pixel bounds {x, y, width, height}, role, name, and tree path.
70
+ server.tool('geometra_query', `Find elements in the current Geometra UI by stable id, role, name, or text content. Returns matching elements with their exact pixel bounds {x, y, width, height}, role, name, and tree path.
23
71
 
24
72
  This is the Geometra equivalent of Playwright's locator — but instant, structured, and with no browser. Use the returned bounds to click elements or assert on layout.`, {
73
+ id: z.string().optional().describe('Stable node id from geometra_snapshot or geometra_expand_section'),
25
74
  role: z.string().optional().describe('ARIA role to match (e.g. "button", "textbox", "text", "heading", "listitem")'),
26
75
  name: z.string().optional().describe('Accessible name to match (exact or substring)'),
27
76
  text: z.string().optional().describe('Text content to search for (substring match)'),
28
- }, async ({ role, name, text }) => {
77
+ }, async ({ id, role, name, text }) => {
29
78
  const session = getSession();
30
79
  if (!session?.tree || !session?.layout)
31
80
  return err('Not connected. Call geometra_connect first.');
32
81
  const a11y = buildA11yTree(session.tree, session.layout);
33
- const matches = findNodes(a11y, { role, name, text });
82
+ const matches = findNodes(a11y, { id, role, name, text });
34
83
  if (matches.length === 0) {
35
- return ok(`No elements found matching ${JSON.stringify({ role, name, text })}`);
84
+ return ok(`No elements found matching ${JSON.stringify({ id, role, name, text })}`);
36
85
  }
37
86
  const result = matches.map(formatNode);
38
87
  return ok(JSON.stringify(result, null, 2));
39
88
  });
40
89
  // ── page model ────────────────────────────────────────────────
41
- server.tool('geometra_page_model', `Get a higher-level webpage model instead of a raw node dump. Extracts common structures such as landmarks, forms, dialogs, and lists, with short previews of fields/actions/items.
90
+ server.tool('geometra_page_model', `Get a higher-level webpage summary instead of a raw node dump. Returns stable section ids, page archetypes, summary counts, top-level landmarks/forms/dialogs/lists, and a few primary actions.
42
91
 
43
- Use this first on normal HTML pages when you want to understand the page shape with fewer tokens than a full snapshot.`, {
44
- maxFieldsPerForm: z
92
+ Use this first on normal HTML pages when you want to understand the page shape with fewer tokens than a full snapshot. Then call geometra_expand_section on a returned section id when you need details.`, {
93
+ maxPrimaryActions: z
45
94
  .number()
46
95
  .int()
47
96
  .min(1)
48
- .max(40)
97
+ .max(12)
49
98
  .optional()
50
- .default(12)
51
- .describe('Cap returned fields per form (default 12).'),
52
- maxActionsPerContainer: z
99
+ .default(6)
100
+ .describe('Cap top-level primary actions (default 6).'),
101
+ maxSectionsPerKind: z
53
102
  .number()
54
103
  .int()
55
104
  .min(1)
56
- .max(20)
105
+ .max(16)
57
106
  .optional()
58
107
  .default(8)
59
- .describe('Cap returned actions per form/dialog (default 8).'),
60
- maxItemsPerList: z
61
- .number()
62
- .int()
63
- .min(1)
64
- .max(20)
65
- .optional()
66
- .default(5)
67
- .describe('Cap list item preview strings (default 5).'),
68
- }, async ({ maxFieldsPerForm, maxActionsPerContainer, maxItemsPerList }) => {
108
+ .describe('Cap returned landmarks/forms/dialogs/lists per kind (default 8).'),
109
+ }, async ({ maxPrimaryActions, maxSectionsPerKind }) => {
69
110
  const session = getSession();
70
111
  if (!session?.tree || !session?.layout)
71
112
  return err('Not connected. Call geometra_connect first.');
72
113
  const a11y = buildA11yTree(session.tree, session.layout);
73
- const model = buildPageModel(a11y, { maxFieldsPerForm, maxActionsPerContainer, maxItemsPerList });
114
+ const model = buildPageModel(a11y, { maxPrimaryActions, maxSectionsPerKind });
74
115
  return ok(JSON.stringify(model));
75
116
  });
117
+ server.tool('geometra_expand_section', `Expand one section from geometra_page_model by stable id. Returns richer on-demand details such as headings, fields, actions, nested lists, list items, and text preview.
118
+
119
+ Use this after geometra_page_model when you know which form/dialog/list/landmark you want to inspect more closely. Per-item bounds are omitted by default to save tokens; set includeBounds=true if you need them immediately.`, {
120
+ id: z.string().describe('Section id from geometra_page_model, e.g. fm:1.0 or ls:2.1'),
121
+ maxHeadings: z.number().int().min(1).max(20).optional().default(6).describe('Cap heading rows'),
122
+ maxFields: z.number().int().min(1).max(40).optional().default(18).describe('Cap field rows'),
123
+ maxActions: z.number().int().min(1).max(30).optional().default(12).describe('Cap action rows'),
124
+ maxLists: z.number().int().min(0).max(20).optional().default(8).describe('Cap nested lists'),
125
+ maxItems: z.number().int().min(0).max(50).optional().default(20).describe('Cap list items'),
126
+ maxTextPreview: z.number().int().min(0).max(20).optional().default(6).describe('Cap text preview lines'),
127
+ includeBounds: z.boolean().optional().default(false).describe('Include bounds for fields/actions/headings/items'),
128
+ }, async ({ id, maxHeadings, maxFields, maxActions, maxLists, maxItems, maxTextPreview, includeBounds }) => {
129
+ const session = getSession();
130
+ if (!session?.tree || !session?.layout)
131
+ return err('Not connected. Call geometra_connect first.');
132
+ const a11y = buildA11yTree(session.tree, session.layout);
133
+ const detail = expandPageSection(a11y, id, {
134
+ maxHeadings,
135
+ maxFields,
136
+ maxActions,
137
+ maxLists,
138
+ maxItems,
139
+ maxTextPreview,
140
+ includeBounds,
141
+ });
142
+ if (!detail)
143
+ return err(`No expandable section found for id ${id}`);
144
+ return ok(JSON.stringify(detail));
145
+ });
76
146
  // ── click ────────────────────────────────────────────────────
77
147
  server.tool('geometra_click', `Click an element in the Geometra UI. Provide either the element's bounds (from geometra_query) or raw x,y coordinates. The click is dispatched server-side via the geometry protocol — no browser, no simulated DOM events.
78
148
 
@@ -222,7 +292,7 @@ Custom React/Vue dropdowns are not supported — open them with geometra_click a
222
292
  // ── snapshot ─────────────────────────────────────────────────
223
293
  server.tool('geometra_snapshot', `Get the current UI as JSON. Default **compact** view: flat list of viewport-visible actionable nodes (links, buttons, inputs, headings, landmarks, text leaves, focusable elements) with bounds and tree paths — far fewer tokens than a full nested tree. Use **full** for complete nested a11y + every wrapper when debugging layout.
224
294
 
225
- JSON is minified in compact view to save tokens. For a webpage-shaped overview (forms, dialogs, lists, landmarks), use geometra_page_model.`, {
295
+ JSON is minified in compact view to save tokens. For a summary-first overview, use geometra_page_model, then geometra_expand_section for just the part you want.`, {
226
296
  view: z
227
297
  .enum(['compact', 'full'])
228
298
  .optional()
@@ -309,13 +379,15 @@ function findNodes(node, filter) {
309
379
  const matches = [];
310
380
  function walk(n) {
311
381
  let match = true;
382
+ if (filter.id && nodeIdForPath(n.path) !== filter.id)
383
+ match = false;
312
384
  if (filter.role && n.role !== filter.role)
313
385
  match = false;
314
386
  if (filter.name && (!n.name || !n.name.includes(filter.name)))
315
387
  match = false;
316
388
  if (filter.text && (!n.name || !n.name.includes(filter.text)))
317
389
  match = false;
318
- if (match && (filter.role || filter.name || filter.text))
390
+ if (match && (filter.id || filter.role || filter.name || filter.text))
319
391
  matches.push(n);
320
392
  for (const child of n.children)
321
393
  walk(child);
@@ -325,6 +397,7 @@ function findNodes(node, filter) {
325
397
  }
326
398
  function formatNode(node) {
327
399
  return {
400
+ id: nodeIdForPath(node.path),
328
401
  role: node.role,
329
402
  name: node.name,
330
403
  bounds: node.bounds,