screenhand 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/LICENSE ADDED
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2025 Khushi Singhal
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
package/README.md ADDED
@@ -0,0 +1,427 @@
1
+ <div align="center">
2
+
3
+ # ScreenHand
4
+
5
+ **Give AI eyes and hands on your desktop.**
6
+
7
+ An open-source [MCP server](https://modelcontextprotocol.io/) that lets Claude (and any AI agent) see your screen, click buttons, type text, and control any app — on both macOS and Windows.
8
+
9
+ [![License: MIT](https://img.shields.io/badge/License-MIT-blue.svg)](LICENSE)
10
+ [![npm: screenhand](https://img.shields.io/npm/v/screenhand)](https://www.npmjs.com/package/screenhand)
11
+ [![Platform: macOS & Windows](https://img.shields.io/badge/Platform-macOS%20%7C%20Windows-green)]()
12
+ [![MCP Compatible](https://img.shields.io/badge/MCP-Compatible-purple)]()
13
+
14
+ [Website](https://screenhand.com) | [Quick Start](#quick-start) | [Tools](#tools) | [FAQ](#faq)
15
+
16
+ </div>
17
+
18
+ ---
19
+
20
+ ## What is ScreenHand?
21
+
22
+ ScreenHand is a **desktop automation bridge for AI**. It connects AI assistants like Claude to your operating system so they can:
23
+
24
+ - **See** your screen via screenshots and OCR
25
+ - **Read** UI elements via Accessibility APIs (macOS) or UI Automation (Windows)
26
+ - **Click** buttons, menus, and links
27
+ - **Type** text into any input field
28
+ - **Control** Chrome tabs via DevTools Protocol
29
+ - **Run** AppleScript commands (macOS)
30
+
31
+ It works as an [MCP (Model Context Protocol)](https://modelcontextprotocol.io/) server, meaning any MCP-compatible AI client can use it out of the box.
32
+
33
+ ## Why ScreenHand?
34
+
35
+ | Problem | ScreenHand Solution |
36
+ |---|---|
37
+ | AI can't see your screen | Screenshots + OCR return all visible text |
38
+ | AI can't click UI elements | Accessibility API finds and clicks elements in ~50ms |
39
+ | AI can't control browsers | Chrome DevTools Protocol gives full page control |
40
+ | AI can't automate workflows | 25+ tools for cross-app automation |
41
+ | Only works on one OS | Native bridges for both macOS and Windows |
42
+
43
+ ## Quick Start
44
+
45
+ ```bash
46
+ git clone https://github.com/manushi4/screenhand.git
47
+ cd screenhand
48
+ npm install
49
+ npm run build:native # macOS — builds Swift bridge
50
+ # npm run build:native:windows # Windows — builds .NET bridge
51
+ ```
52
+
53
+ Then connect ScreenHand to your AI client.
54
+
55
+ ### Claude Desktop
56
+
57
+ Add to `~/Library/Application Support/Claude/claude_desktop_config.json`:
58
+
59
+ ```json
60
+ {
61
+ "mcpServers": {
62
+ "screenhand": {
63
+ "command": "npx",
64
+ "args": ["tsx", "/path/to/screenhand/src/mcp-entry.ts"]
65
+ }
66
+ }
67
+ }
68
+ ```
69
+
70
+ ### Claude Code
71
+
72
+ Add to your project `.mcp.json` or `~/.claude/settings.json`:
73
+
74
+ ```json
75
+ {
76
+ "mcpServers": {
77
+ "screenhand": {
78
+ "command": "npx",
79
+ "args": ["tsx", "/path/to/screenhand/src/mcp-entry.ts"]
80
+ }
81
+ }
82
+ }
83
+ ```
84
+
85
+ ### Cursor
86
+
87
+ Add to `.cursor/mcp.json` in your project (or `~/.cursor/mcp.json` for global):
88
+
89
+ ```json
90
+ {
91
+ "mcpServers": {
92
+ "screenhand": {
93
+ "command": "npx",
94
+ "args": ["tsx", "/path/to/screenhand/src/mcp-entry.ts"]
95
+ }
96
+ }
97
+ }
98
+ ```
99
+
100
+ ### OpenAI Codex CLI
101
+
102
+ Add to `~/.codex/config.toml`:
103
+
104
+ ```toml
105
+ [mcp.screenhand]
106
+ command = "npx"
107
+ args = ["tsx", "/path/to/screenhand/src/mcp-entry.ts"]
108
+ transport = "stdio"
109
+ ```
110
+
111
+ ### OpenClaw
112
+
113
+ Add to your `openclaw.json`:
114
+
115
+ ```json
116
+ {
117
+ "mcpServers": {
118
+ "screenhand": {
119
+ "command": "npx",
120
+ "args": ["tsx", "/path/to/screenhand/src/mcp-entry.ts"]
121
+ }
122
+ }
123
+ }
124
+ ```
125
+
126
+ > **Why?** OpenClaw's built-in desktop control sends a screenshot to an LLM for every click (~3-5s, costs an API call). ScreenHand uses native Accessibility APIs — `press('Send')` runs in ~50ms with zero AI calls. See the full [integration guide](docs/openclaw-integration.md).
127
+
128
+ ### Any MCP Client
129
+
130
+ ScreenHand is a standard MCP server over stdio. It works with any MCP-compatible client — just point it at `src/mcp-entry.ts`.
131
+
132
+ Replace `/path/to/screenhand` with the actual path where you cloned the repo.
133
+
134
+ ## Tools
135
+
136
+ ScreenHand exposes 25+ tools organized by category.
137
+
138
+ ### See the Screen
139
+
140
+ | Tool | What it does | Speed |
141
+ |------|-------------|-------|
142
+ | `screenshot` | Full screenshot + OCR — returns all visible text | ~600ms |
143
+ | `screenshot_file` | Screenshot saved to file (for viewing the image) | ~400ms |
144
+ | `ocr` | OCR with element positions and bounding boxes | ~600ms |
145
+
146
+ ### Control Any App (Accessibility / UI Automation)
147
+
148
+ | Tool | What it does | Speed |
149
+ |------|-------------|-------|
150
+ | `apps` | List running apps with bundle IDs and PIDs | ~10ms |
151
+ | `windows` | List visible windows with positions and sizes | ~10ms |
152
+ | `focus` | Bring an app to the front | ~10ms |
153
+ | `launch` | Launch an app by bundle ID or name | ~1s |
154
+ | `ui_tree` | Full UI element tree — instant, no OCR needed | ~50ms |
155
+ | `ui_find` | Find a UI element by text or title | ~50ms |
156
+ | `ui_press` | Click a UI element by its title | ~50ms |
157
+ | `ui_set_value` | Set value of a text field, slider, etc. | ~50ms |
158
+ | `menu_click` | Click a menu bar item by path | ~100ms |
159
+
160
+ ### Keyboard and Mouse
161
+
162
+ | Tool | What it does |
163
+ |------|-------------|
164
+ | `click` | Click at screen coordinates |
165
+ | `click_text` | Find text via OCR and click it (fallback) |
166
+ | `type_text` | Type text via keyboard |
167
+ | `key` | Key combo (e.g. `cmd+s`, `ctrl+shift+n`) |
168
+ | `drag` | Drag from point A to B |
169
+ | `scroll` | Scroll at a position |
170
+
171
+ ### Chrome Browser (CDP)
172
+
173
+ | Tool | What it does |
174
+ |------|-------------|
175
+ | `browser_tabs` | List all open Chrome tabs |
176
+ | `browser_open` | Open URL in new tab |
177
+ | `browser_navigate` | Navigate active tab to URL |
178
+ | `browser_js` | Run JavaScript in a tab |
179
+ | `browser_dom` | Query DOM with CSS selectors |
180
+ | `browser_click` | Click element by CSS selector (uses CDP mouse events) |
181
+ | `browser_type` | Type into an input field (uses CDP keyboard events, React-compatible) |
182
+ | `browser_wait` | Wait for a page condition |
183
+ | `browser_page_info` | Get page title, URL, and content |
184
+
185
+ ### Anti-Detection & Stealth (CDP)
186
+
187
+ Tools for interacting with sites that have bot detection (Instagram, LinkedIn, etc.):
188
+
189
+ | Tool | What it does |
190
+ |------|-------------|
191
+ | `browser_stealth` | Inject anti-detection patches (hides webdriver flag, fakes plugins/languages) |
192
+ | `browser_fill_form` | Human-like typing with random delays via CDP keyboard events |
193
+ | `browser_human_click` | Realistic mouse event sequence (mouseMoved → mousePressed → mouseReleased) |
194
+
195
+ > **Tip:** Call `browser_stealth` once after navigating to a protected site. Then use `browser_fill_form` and `browser_human_click` for interactions. The regular `browser_type` and `browser_click` also use CDP Input events now.
196
+
197
+ ### Platform Playbooks (lazy-loaded)
198
+
199
+ Pre-built automation knowledge for specific platforms — selectors, URLs, flows, and **error solutions**.
200
+
201
+ | Tool | What it does |
202
+ |------|-------------|
203
+ | `platform_guide` | Get automation guide for a platform (selectors, URLs, flows, errors+solutions) |
204
+ | `export_playbook` | Auto-generate a playbook from your session. Share it to help others. |
205
+
206
+ ```
207
+ platform_guide({ platform: "devpost", section: "errors" }) # Just errors + solutions
208
+ platform_guide({ platform: "devpost", section: "selectors" }) # All CSS selectors
209
+ platform_guide({ platform: "devpost", section: "flows" }) # Step-by-step workflows
210
+ platform_guide({ platform: "devpost" }) # Full playbook
211
+ ```
212
+
213
+ **Contributing playbooks:** After automating any site, run:
214
+ ```
215
+ export_playbook({ platform: "twitter", domain: "twitter.com" })
216
+ ```
217
+ This auto-extracts URLs, selectors, errors+solutions from your session and saves a ready-to-share `playbooks/twitter.json`.
218
+
219
+ Available platforms: `devpost`. Add more by running `export_playbook` or creating JSON files in `playbooks/`.
220
+
221
+ Zero performance cost — files only read when `platform_guide` is called.
222
+
223
+ ### AppleScript (macOS only)
224
+
225
+ | Tool | What it does |
226
+ |------|-------------|
227
+ | `applescript` | Run any AppleScript command |
228
+
229
+ ### Memory (Learning) — zero-config, zero-latency
230
+
231
+ ScreenHand gets smarter every time you use it — **no manual setup needed**.
232
+
233
+ **What happens automatically:**
234
+ - Every tool call is logged (async, non-blocking — adds ~0ms to response time)
235
+ - After 3+ consecutive successes, the winning sequence is saved as a reusable strategy
236
+ - Known error patterns are tracked with resolutions (e.g. "launch times out → use focus() instead")
237
+ - On every tool call, the response includes **auto-recall hints**:
238
+ - Error warnings if the tool has failed before
239
+ - Next-step suggestions if you're mid-way through a known strategy
240
+
241
+ **Predefined seed strategies:**
242
+ - Ships with 12 common macOS workflows (Photo Booth, Chrome navigation, copy/paste, Finder, export PDF, etc.)
243
+ - Loaded automatically on first boot — the system has knowledge from day one
244
+ - Seeds are searchable via `memory_recall` and provide next-step hints like any learned strategy
245
+
246
+ **Background web research:**
247
+ - When a tool fails and no resolution exists, ScreenHand searches for a fix in the background (non-blocking)
248
+ - Uses Claude API (haiku, if `ANTHROPIC_API_KEY` is set) or DuckDuckGo instant answers as fallback
249
+ - Resolutions are saved to both error cache and strategy store — zero-latency recall next time
250
+ - Completely silent and fire-and-forget — never blocks tool responses or throws errors
251
+
252
+ **Fingerprint matching & feedback loop:**
253
+ - Each strategy is fingerprinted by its tool sequence (e.g. `apps→focus→ui_press`)
254
+ - O(1) exact-match lookup when the agent follows a known sequence
255
+ - Success/failure outcomes are tracked per strategy — unreliable strategies are auto-penalized and eventually skipped
256
+ - Keyword-based fuzzy search with reliability scoring for `memory_recall`
257
+
258
+ **Production-grade under the hood:**
259
+ - All data cached in RAM at startup — lookups are ~0ms, disk is only for persistence
260
+ - Disk writes are async and buffered (100ms debounce) — never block tool calls
261
+ - Sync flush on process exit (SIGINT/SIGTERM) — no lost writes
262
+ - Per-line JSONL parsing — corrupted lines are skipped, not fatal
263
+ - LRU eviction: 500 strategies, 200 error patterns max (oldest evicted automatically)
264
+ - File locking (`.lock` + PID) prevents corruption from concurrent instances
265
+ - Action log auto-rotates at 10 MB
266
+ - Data lives in `.screenhand/memory/` as JSONL (grep-friendly, no database)
267
+
268
+ | Tool | What it does |
269
+ |------|-------------|
270
+ | `memory_recall` | Explicitly search past strategies by task description |
271
+ | `memory_save` | Manually save the current session (auto-save handles most cases) |
272
+ | `memory_errors` | View all known error patterns and their resolutions |
273
+ | `memory_stats` | Action counts, success rates, top tools, disk usage |
274
+ | `memory_clear` | Clear actions, strategies, errors, or all data |
275
+
276
+ ## How It Works
277
+
278
+ ScreenHand has three layers:
279
+
280
+ ```
281
+ AI Client (Claude, Cursor, etc.)
282
+ ↓ MCP protocol (stdio)
283
+ ScreenHand MCP Server (TypeScript)
284
+ ↓ JSON-RPC (stdio)
285
+ Native Bridge (Swift on macOS / C# on Windows)
286
+ ↓ Platform APIs
287
+ Operating System (Accessibility, CoreGraphics, UI Automation, SendInput)
288
+ ```
289
+
290
+ 1. **Native bridge** — talks directly to OS-level APIs:
291
+ - **macOS**: Swift binary using Accessibility APIs, CoreGraphics, and Vision framework (OCR)
292
+ - **Windows**: C# (.NET 8) binary using UI Automation, SendInput, GDI+, and Windows.Media.Ocr
293
+ 2. **TypeScript MCP server** — routes tools to the correct bridge, handles Chrome CDP, manages sessions
294
+ 3. **MCP protocol** — standard Model Context Protocol so any AI client can connect
295
+
296
+ The native bridge is auto-selected based on your OS. Both bridges speak the same JSON-RPC protocol, so all tools work identically on both platforms.
297
+
298
+ ## Use Cases
299
+
300
+ ### App Debugging
301
+ Claude reads UI trees, clicks through flows, and checks element states — faster than clicking around yourself.
302
+
303
+ ### Design Inspection
304
+ Screenshots + OCR to read exactly what's on screen. `ui_tree` shows component structure like React DevTools but for any native app.
305
+
306
+ ### Browser Automation
307
+ Fill forms, scrape data, run JavaScript, navigate pages — all through Chrome DevTools Protocol.
308
+
309
+ ### Cross-App Workflows
310
+ Read from one app, paste into another, chain actions across your whole desktop. Example: extract data from a spreadsheet, search it in Chrome, paste results into Notes.
311
+
312
+ ### UI Testing
313
+ Click buttons, verify text appears, catch visual regressions — all driven by AI.
314
+
315
+ ## Requirements
316
+
317
+ ### macOS
318
+
319
+ - macOS 12+
320
+ - Node.js 18+
321
+ - Accessibility permissions: System Settings > Privacy & Security > Accessibility > enable your terminal
322
+ - Chrome with `--remote-debugging-port=9222` (only for browser tools)
323
+
324
+ ### Windows
325
+
326
+ - Windows 10 (1809+)
327
+ - Node.js 18+
328
+ - [.NET 8 SDK](https://dotnet.microsoft.com/download/dotnet/8.0)
329
+ - No special permissions needed — UI Automation works without admin
330
+ - Chrome with `--remote-debugging-port=9222` (only for browser tools)
331
+ - Build: `npm run build:native:windows`
332
+
333
+ ## Skills (Slash Commands)
334
+
335
+ ScreenHand ships with Claude Code slash commands:
336
+
337
+ - `/screenshot` — capture your screen and describe what's visible
338
+ - `/debug-ui` — inspect the UI tree of any app
339
+ - `/automate` — describe a task and Claude does it
340
+
341
+ **Install globally** so they work in any project:
342
+
343
+ ```bash
344
+ ./install-skills.sh
345
+ ```
346
+
347
+ ## Development
348
+
349
+ ```bash
350
+ npm run check # type-check (covers all entry files)
351
+ npm test # run test suite (95 tests)
352
+ npm run build # compile TypeScript
353
+ npm run build:native # build Swift bridge (macOS)
354
+ npm run build:native:windows # build .NET bridge (Windows)
355
+ ```
356
+
357
+ ## FAQ
358
+
359
+ ### What is ScreenHand?
360
+ ScreenHand is an open-source MCP server that gives AI assistants like Claude the ability to see and control your desktop. It provides 25+ tools for screenshots, UI inspection, clicking, typing, and browser automation on both macOS and Windows.
361
+
362
+ ### How does ScreenHand differ from Anthropic's Computer Use?
363
+ Anthropic's Computer Use is a cloud-based feature built into Claude. ScreenHand is an open-source, local-first tool that runs entirely on your machine with no cloud dependency. It uses native OS APIs (Accessibility on macOS, UI Automation on Windows) which are faster and more reliable than screenshot-based approaches.
364
+
365
+ ### How does ScreenHand differ from OpenClaw?
366
+ OpenClaw is a general-purpose AI agent that controls your computer by looking at the screen — it takes screenshots, interprets them with an LLM, then simulates mouse/keyboard input. ScreenHand takes a fundamentally different approach:
367
+
368
+ | | ScreenHand | OpenClaw |
369
+ |---|---|---|
370
+ | **How it sees the UI** | Native Accessibility/UI Automation APIs — reads the actual element tree | Screenshots + LLM vision — interprets pixels |
371
+ | **Speed** | ~50ms per UI action | Seconds per action (screenshot → LLM → click) |
372
+ | **Accuracy** | Exact element targeting by role/title | Coordinate-based — can misclick if layout shifts |
373
+ | **Architecture** | MCP server — works with any MCP client (Claude, Cursor, Codex CLI) | Standalone agent — tied to its own runtime |
374
+ | **Model lock-in** | None — any MCP-compatible AI decides what to do | Supports multiple LLMs but runs its own agent loop |
375
+ | **Learning memory** | Built-in: auto-learns strategies, tracks errors, O(1) fingerprint recall | Skill-based: 5,000+ community skills, but no automatic learning from usage |
376
+ | **Security** | Scoped MCP tools, audit logging, no browser cookie access | Full computer access, uses browser cookies, significant security surface |
377
+ | **Setup** | `npm install` + grant accessibility permission | Requires careful sandboxing, not recommended on personal machines |
378
+
379
+ **TL;DR**: OpenClaw is a powerful autonomous agent for tinkerers who want maximum flexibility. ScreenHand is a focused, fast, secure automation layer designed to be embedded into any AI workflow via MCP — with native API speed instead of screenshot-based guessing.
380
+
381
+ ### Does ScreenHand work on Windows?
382
+ Yes. ScreenHand supports both macOS and Windows. On macOS it uses a Swift native bridge with Accessibility APIs. On Windows it uses a C# (.NET 8) bridge with UI Automation and SendInput.
383
+
384
+ ### What AI clients work with ScreenHand?
385
+ Any MCP-compatible client: Claude Desktop, Claude Code, Cursor, Windsurf, OpenAI Codex CLI, and any other tool that supports the Model Context Protocol.
386
+
387
+ ### Does ScreenHand need admin/root permissions?
388
+ On macOS, you need to grant Accessibility permissions to your terminal app. On Windows, no special permissions are needed — UI Automation works without admin for most applications.
389
+
390
+ ### Is ScreenHand safe to use?
391
+ ScreenHand runs locally and never sends screen data to external servers. Dangerous tools (AppleScript, browser JS execution) are audit-logged. You control which AI client connects to it via MCP configuration.
392
+
393
+ ### Can ScreenHand control any application?
394
+ On macOS, it can control any app that exposes Accessibility elements (most apps do). On Windows, it works with any app that supports UI Automation. Some apps with custom rendering (games, some Electron apps) may have limited element tree support — use OCR as a fallback.
395
+
396
+ ### How fast is ScreenHand?
397
+ Accessibility/UI Automation operations take ~50ms. Chrome CDP operations take ~10ms. Screenshots with OCR take ~600ms. Memory lookups add ~0ms (in-memory cache). ScreenHand is significantly faster than screenshot-only approaches because it reads the UI tree directly.
398
+
399
+ ### Does the learning memory affect performance?
400
+ No. All memory data is loaded into RAM at startup. Lookups are O(1) hash map reads. Disk writes are async and buffered — they never block tool responses. The memory system adds effectively zero latency to any tool call.
401
+
402
+ ### Is the memory data safe from corruption?
403
+ Yes. JSONL files are parsed line-by-line — a single corrupted line is skipped without affecting other entries. File locking prevents concurrent write corruption. Pending writes are flushed synchronously on exit (SIGINT/SIGTERM). Cache sizes are capped with LRU eviction to prevent unbounded growth.
404
+
405
+ ## Contributing
406
+
407
+ Contributions are welcome! Please open an issue first to discuss what you'd like to change.
408
+
409
+ ```bash
410
+ git clone https://github.com/manushi4/screenhand.git
411
+ cd screenhand
412
+ npm install
413
+ npm run build:native
414
+ npm test
415
+ ```
416
+
417
+ ## License
418
+
419
+ MIT
420
+
421
+ ---
422
+
423
+ <div align="center">
424
+
425
+ **[screenhand.com](https://screenhand.com)** | Built by [Khushi Singhal](https://github.com/manushi4) | A product of **Clazro Technology Private Limited**
426
+
427
+ </div>
package/dist/config.js ADDED
@@ -0,0 +1,9 @@
1
+ export const DEFAULT_ACTION_BUDGET = {
2
+ locateMs: 800,
3
+ actMs: 200,
4
+ verifyMs: 2000,
5
+ maxRetries: 1,
6
+ };
7
+ export const DEFAULT_NAVIGATE_TIMEOUT_MS = 10_000;
8
+ export const DEFAULT_WAIT_TIMEOUT_MS = 2_000;
9
+ export const DEFAULT_PROFILE = "automation";
package/dist/index.js ADDED
@@ -0,0 +1,55 @@
1
+ import { TimelineLogger } from "./logging/timeline-logger.js";
2
+ import { MvpMcpServer } from "./mcp/server.js";
3
+ import { PlaceholderAppAdapter, } from "./runtime/app-adapter.js";
4
+ import { CdpChromeAdapter } from "./runtime/cdp-chrome-adapter.js";
5
+ import { AutomationRuntimeService } from "./runtime/service.js";
6
+ export { PlaceholderAppAdapter } from "./runtime/app-adapter.js";
7
+ export { CdpChromeAdapter } from "./runtime/cdp-chrome-adapter.js";
8
+ export { AccessibilityAdapter } from "./runtime/accessibility-adapter.js";
9
+ export { AppleScriptAdapter } from "./runtime/applescript-adapter.js";
10
+ export { VisionAdapter } from "./runtime/vision-adapter.js";
11
+ export { CompositeAdapter } from "./runtime/composite-adapter.js";
12
+ export { BridgeClient, BridgeClient as MacOSBridgeClient } from "./native/bridge-client.js";
13
+ export { StateObserver } from "./runtime/state-observer.js";
14
+ export { PlanningLoop } from "./runtime/planning-loop.js";
15
+ export { AutomationRuntimeService } from "./runtime/service.js";
16
+ export { MvpMcpServer } from "./mcp/server.js";
17
+ export { createMcpStdioServer, startMcpStdioServer } from "./mcp/mcp-stdio-server.js";
18
+ export function createRuntimeApp(adapter) {
19
+ const logger = new TimelineLogger();
20
+ const runtime = new AutomationRuntimeService(adapter, logger);
21
+ const mcp = new MvpMcpServer(runtime);
22
+ return { runtime, mcp };
23
+ }
24
+ async function createDefaultAdapter() {
25
+ if (process.env.AUTOMATOR_ADAPTER === "placeholder") {
26
+ return new PlaceholderAppAdapter();
27
+ }
28
+ if (process.env.AUTOMATOR_ADAPTER === "composite") {
29
+ // Lazy import to avoid requiring Swift bridge for CDP-only usage
30
+ const { MacOSBridgeClient } = await import("./native/macos-bridge-client.js");
31
+ const { CompositeAdapter } = await import("./runtime/composite-adapter.js");
32
+ const bridge = new MacOSBridgeClient();
33
+ return new CompositeAdapter(bridge, {
34
+ headless: process.env.AUTOMATOR_HEADLESS === "1",
35
+ });
36
+ }
37
+ if (process.env.AUTOMATOR_ADAPTER === "accessibility") {
38
+ const { MacOSBridgeClient } = await import("./native/macos-bridge-client.js");
39
+ const { AccessibilityAdapter } = await import("./runtime/accessibility-adapter.js");
40
+ const bridge = new MacOSBridgeClient();
41
+ return new AccessibilityAdapter(bridge);
42
+ }
43
+ return new CdpChromeAdapter({
44
+ headless: process.env.AUTOMATOR_HEADLESS === "1",
45
+ });
46
+ }
47
+ const app = createRuntimeApp(await createDefaultAdapter());
48
+ if (process.argv.includes("--healthcheck")) {
49
+ const session = await app.runtime.sessionStart("automation");
50
+ console.log(JSON.stringify({
51
+ status: "ok",
52
+ session,
53
+ note: "Runtime loaded with universal adapter support.",
54
+ }, null, 2));
55
+ }
@@ -0,0 +1,29 @@
1
+ export class TimelineLogger {
2
+ timeline = [];
3
+ start(action, sessionId) {
4
+ return {
5
+ action,
6
+ sessionId,
7
+ startedAt: new Date().toISOString(),
8
+ locateMs: 0,
9
+ actMs: 0,
10
+ verifyMs: 0,
11
+ retries: 0,
12
+ };
13
+ }
14
+ finish(telemetry, status) {
15
+ const finishedAt = new Date().toISOString();
16
+ const totalMs = new Date(finishedAt).getTime() - new Date(telemetry.startedAt).getTime();
17
+ const finalized = {
18
+ ...telemetry,
19
+ finishedAt,
20
+ totalMs,
21
+ status,
22
+ };
23
+ this.timeline.push(finalized);
24
+ return finalized;
25
+ }
26
+ getRecent(limit = 50) {
27
+ return this.timeline.slice(-limit);
28
+ }
29
+ }