copilot-liku-cli 0.0.9 → 0.0.11

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/INSTALLATION.md CHANGED
@@ -155,6 +155,29 @@ npm link
155
155
 
156
156
  This creates a symbolic link from your global `node_modules` to your local development directory. Any changes you make will be immediately available when you run `liku`.
157
157
 
158
+ ### 3b. Use the local repo version in another project (same machine)
159
+
160
+ If you want another project (e.g., `C:\dev\Whatup`) to use this local working copy instead of the npm-published version:
161
+
162
+ From the other project folder:
163
+
164
+ ```bash
165
+ npm link copilot-liku-cli
166
+ ```
167
+
168
+ Recommended verification (ensures you are using the local linked binary):
169
+
170
+ ```bash
171
+ npx --no-install liku doctor --json
172
+ ```
173
+
174
+ To switch the other project back to the published npm package:
175
+
176
+ ```bash
177
+ npm unlink copilot-liku-cli
178
+ npm install copilot-liku-cli
179
+ ```
180
+
158
181
  ### 4. Verify Setup
159
182
 
160
183
  ```bash
package/QUICKSTART.md CHANGED
@@ -42,6 +42,37 @@ liku start
42
42
  npm start
43
43
  ```
44
44
 
45
+ #### Option 3: Use the local repo version in another project (recommended for dev)
46
+
47
+ If you want a different project (e.g., `C:\dev\Whatup`) to use your *local working copy* of this repo (instead of the npm-published version), use `npm link`.
48
+
49
+ From the repo root:
50
+
51
+ ```bash
52
+ npm link
53
+ ```
54
+
55
+ From the other project:
56
+
57
+ ```bash
58
+ npm link copilot-liku-cli
59
+ ```
60
+
61
+ Verify you’re running the repo copy (recommended):
62
+
63
+ ```bash
64
+ npx --no-install liku doctor --json
65
+ ```
66
+
67
+ Look for `env.projectRoot` being the repo path (e.g., `C:\dev\copilot-Liku-cli`).
68
+
69
+ To switch back to the published npm version:
70
+
71
+ ```bash
72
+ npm unlink copilot-liku-cli
73
+ npm i copilot-liku-cli
74
+ ```
75
+
45
76
  ## Quick Verify (Recommended)
46
77
 
47
78
  After install, run these checks in order:
@@ -60,6 +91,41 @@ npm run test:ui
60
91
  This order gives clearer pass/fail signals by validating runtime health first,
61
92
  then shortcut routing, then module-level UI automation.
62
93
 
94
+ ### Targeting sanity check
95
+
96
+ Before running keyboard-driven automation (especially browser tab operations), verify what Liku considers the active window:
97
+
98
+ ```bash
99
+ liku doctor
100
+ ```
101
+
102
+ This prints the resolved package root/version (to confirm local vs global) and the current active window (title/process).
103
+
104
+ For deterministic, machine-readable output (recommended for smaller models / automation), use:
105
+
106
+ ```bash
107
+ liku doctor --json
108
+ ```
109
+
110
+ #### `doctor.v1` schema contract (for smaller models)
111
+
112
+ When you consume `liku doctor --json`, treat it as the source-of-truth for targeting and planning. The output is a single JSON object with:
113
+
114
+ - `schemaVersion` (string): currently `doctor.v1`.
115
+ - `ok` (boolean): `false` means at least one `checks[].status === "fail"`.
116
+ - `checks[]` (array): structured checks with `{ id, status: "pass"|"warn"|"fail", message, details? }`.
117
+ - `uiState` (object): UI Automation snapshot
118
+ - `uiState.activeWindow`: where input will go *right now*
119
+ - `uiState.windows[]`: discovered top-level windows (bounded unless `--all`)
120
+ - `targeting` (object | null): present when `doctor` is given a request text
121
+ - `targeting.selectedWindow`: the best-matched window candidate
122
+ - `targeting.candidates[]`: scored alternatives (for disambiguation)
123
+ - `plan` (object | null): present when a request is provided and a plan can be generated
124
+ - `plan.steps[]`: ordered steps, each with `{ state, goal, command, verification, notes? }`
125
+ - `next.commands[]` (array of strings): copy/paste-ready commands extracted from `plan.steps[].command`.
126
+
127
+ **Deterministic execution rule:** run `plan.steps[]` in order, and re-check `liku window --active` after any focus change before sending keys.
128
+
63
129
  `smoke:shortcuts` intentionally validates chat visibility via direct in-app
64
130
  toggle and validates keyboard routing on overlay with target gating.
65
131
 
@@ -138,6 +204,32 @@ Right-click the tray icon to see:
138
204
 
139
205
  ## Common Tasks
140
206
 
207
+ ### Browser actions (Edge/Chrome)
208
+
209
+ When automating browsers, be explicit about **targeting**:
210
+ 1. Ensure the correct browser window is active (bring it to front / focus it)
211
+ 2. Ensure the correct tab is active (click the tab title, or use \`ctrl+1..9\`)
212
+ 3. Then perform the action (e.g., close tab with \`ctrl+w\`)
213
+
214
+ If you skip steps 1–2 and the overlay/chat has focus, keyboard shortcuts may close the overlay instead of affecting the browser.
215
+
216
+ #### Robust recipe (recommended)
217
+
218
+ If your intent is to **continue in an existing Edge/Chrome window/tab**, prefer **in-window control** (focus + keyboard) over launching the browser again.
219
+
220
+ - Prefer: **focus window → new tab / address bar → type → enter → verify**
221
+ - Avoid for “existing tab control”: PowerShell COM \`SendKeys\`, \`Start-Process msedge ...\`, and \`microsoft-edge:...\` (these often open new windows/tabs and can be flaky).
222
+
223
+ **Canonical flow (what to ask the agent to do):**
224
+ 1) Bring the **target browser window** (Edge/Chrome/Firefox/Brave/etc) to the foreground
225
+ 2) \`ctrl+t\` (new tab) then \`ctrl+l\` (address bar)
226
+ 3) Type a full URL (prefer \`https://...\`) and press Enter
227
+ 4) Wait for load, then perform page-level action (e.g., YouTube search)
228
+ 5) Validate after major steps; if typing drops characters, re-focus the address bar and retry
229
+
230
+ **Self-heal typing retry (when URL is wrong):**
231
+ \`ctrl+l\` → \`ctrl+a\` → type URL again → \`enter\`
232
+
141
233
  ### Selecting a Screen Element
142
234
  ```
143
235
  1. Press Ctrl+Alt+Space to open chat
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "copilot-liku-cli",
3
- "version": "0.0.9",
3
+ "version": "0.0.11",
4
4
  "description": "GitHub Copilot CLI with headless agent + ultra-thin overlay architecture",
5
5
  "main": "src/main/index.js",
6
6
  "bin": {
@@ -0,0 +1,816 @@
1
+ /**
2
+ * doctor command - Minimal diagnostics for targeting reliability
3
+ * @module cli/commands/doctor
4
+ */
5
+
6
+ const path = require('path');
7
+ const { success, error, info, highlight, dim } = require('../util/output');
8
+
9
+ const PROJECT_ROOT = path.resolve(__dirname, '../../..');
10
+ const UI_MODULE = path.resolve(__dirname, '../../main/ui-automation');
11
+
12
+ const DOCTOR_SCHEMA_VERSION = 'doctor.v1';
13
+
14
+ function safeJsonStringify(value) {
15
+ try {
16
+ return JSON.stringify(value, null, 2);
17
+ } catch {
18
+ return null;
19
+ }
20
+ }
21
+
22
+ async function withConsoleSilenced(enabled, fn) {
23
+ if (!enabled) {
24
+ return fn();
25
+ }
26
+
27
+ const original = {
28
+ log: console.log,
29
+ info: console.info,
30
+ warn: console.warn,
31
+ error: console.error,
32
+ };
33
+
34
+ console.log = () => {};
35
+ console.info = () => {};
36
+ console.warn = () => {};
37
+ console.error = () => {};
38
+
39
+ try {
40
+ return await fn();
41
+ } finally {
42
+ console.log = original.log;
43
+ console.info = original.info;
44
+ console.warn = original.warn;
45
+ console.error = original.error;
46
+ }
47
+ }
48
+
49
+ function normalizeText(text) {
50
+ return String(text || '').trim();
51
+ }
52
+
53
+ function normalizeForMatch(text) {
54
+ return normalizeText(text).toLowerCase();
55
+ }
56
+
57
+ function normalizeForLooseMatch(text) {
58
+ return normalizeForMatch(text)
59
+ .replace(/[^a-z0-9]+/g, ' ')
60
+ .replace(/\s+/g, ' ')
61
+ .trim();
62
+ }
63
+
64
+ function includesCI(haystack, needle) {
65
+ if (!haystack || !needle) return false;
66
+ // Loose match to tolerate punctuation differences (e.g., "Microsoft? Edge Beta")
67
+ return normalizeForLooseMatch(haystack).includes(normalizeForLooseMatch(needle));
68
+ }
69
+
70
+ function extractQuotedStrings(text) {
71
+ const out = [];
72
+ const str = normalizeText(text);
73
+ const re = /"([^"]+)"|'([^']+)'/g;
74
+ let m;
75
+ while ((m = re.exec(str)) !== null) {
76
+ const val = m[1] || m[2];
77
+ if (val) out.push(val);
78
+ }
79
+ return out;
80
+ }
81
+
82
+ function escapeDoubleQuotes(text) {
83
+ return String(text || '').replace(/"/g, '\\"');
84
+ }
85
+
86
+ function extractUrlCandidate(text) {
87
+ const str = normalizeText(text);
88
+
89
+ // Full URL
90
+ const fullUrl = /(https?:\/\/[^\s"']+)/i.exec(str);
91
+ if (fullUrl?.[1]) return fullUrl[1];
92
+
93
+ // Common bare domains (keep conservative)
94
+ const bare = /\b([a-z0-9-]+\.)+(com|net|org|io|ai|dev|edu|gov)(\/[^\s"']*)?\b/i.exec(str);
95
+ if (bare?.[0]) return bare[0];
96
+
97
+ return null;
98
+ }
99
+
100
+ function extractSearchQuery(text) {
101
+ const str = normalizeText(text);
102
+ const quoted = extractQuotedStrings(str);
103
+
104
+ // Prefer quoted strings if user said search ... for "..."
105
+ const searchFor = /\bsearch\b/i.test(str) && /\bfor\b/i.test(str);
106
+ if (searchFor && quoted.length) return quoted[0];
107
+
108
+ // Unquoted: search (on/in)? (youtube/google)? for <rest>
109
+ const m = /\bsearch(?:\s+(?:on|in))?(?:\s+(?:youtube|google))?\s+for\s+([^\n\r.;]+)$/i.exec(str);
110
+ if (m?.[1]) return normalizeText(m[1]);
111
+
112
+ return null;
113
+ }
114
+
115
+ function toHttpsUrl(urlish) {
116
+ const u = normalizeText(urlish);
117
+ if (!u) return null;
118
+ if (/^https?:\/\//i.test(u)) return u;
119
+ return `https://${u}`;
120
+ }
121
+
122
+ function buildSearchUrl({ query, preferYouTube = false }) {
123
+ const q = normalizeText(query);
124
+ if (!q) return null;
125
+ if (preferYouTube) {
126
+ return `https://www.youtube.com/results?search_query=${encodeURIComponent(q)}`;
127
+ }
128
+ return `https://www.google.com/search?q=${encodeURIComponent(q)}`;
129
+ }
130
+
131
+ function parseRequestHints(requestText) {
132
+ const text = normalizeText(requestText);
133
+ const lower = normalizeForMatch(text);
134
+
135
+ // Extract common patterns
136
+ const tabTitleMatch = /\btab\s+(?:titled|named|called)\s+(?:"([^"]+)"|'([^']+)'|([^,.;\n\r]+))/i.exec(text);
137
+ const tabTitle = tabTitleMatch ? normalizeText(tabTitleMatch[1] || tabTitleMatch[2] || tabTitleMatch[3]) : null;
138
+
139
+ const inWindowMatch = /\b(?:in|within)\s+([^\n\r]+?)\s+window\b/i.exec(text);
140
+ const windowHint = inWindowMatch ? normalizeText(inWindowMatch[1]) : null;
141
+
142
+ const wantsNewTab = /\bnew\s+tab\b/i.test(text) || /\bopen\s+a\s+new\s+tab\b/i.test(text);
143
+ const urlCandidate = extractUrlCandidate(text);
144
+ const searchQuery = extractSearchQuery(text);
145
+
146
+ const wantsIntegratedBrowser = /\b(integrated\s+browser|simple\s+browser|inside\s+vs\s*code|in\s+vs\s*code|vscode\s+insiders|workbench\.browser\.openlocalhostlinks|live\s+preview)\b/i.test(text);
147
+
148
+ const browserSignals = Boolean(urlCandidate)
149
+ || Boolean(searchQuery)
150
+ || /\b(go\s+to|navigate|visit|open\s+youtube|youtube\.com|search)\b/i.test(text);
151
+
152
+ // Heuristic: infer app family
153
+ const appHints = {
154
+ isBrowser: /\b(edge|chrome|chromium|firefox|brave|opera|vivaldi|browser|msedge)\b/i.test(text) || browserSignals,
155
+ isEditor: /\b(vs\s*code|visual\s*studio\s*code|code\s*-\s*insiders|editor)\b/i.test(text),
156
+ isTerminal: /\b(terminal|powershell|cmd\.exe|command\s+prompt|windows\s+terminal)\b/i.test(text),
157
+ isExplorer: /\b(file\s+explorer|explorer\.exe)\b/i.test(text),
158
+ };
159
+
160
+ const requestedBrowser = (() => {
161
+ // Ordered from most-specific to least-specific
162
+ if (/\bedge\s+beta\b/i.test(text)) return { name: 'edge', keywords: ['edge', 'msedge', 'beta'] };
163
+ if (/\bmsedge\b/i.test(text) || /\bmicrosoft\s+edge\b/i.test(text) || /\bedge\b/i.test(text)) return { name: 'edge', keywords: ['edge', 'msedge'] };
164
+ if (/\bgoogle\s+chrome\b/i.test(text) || /\bchrome\b/i.test(text) || /\bchromium\b/i.test(text)) return { name: 'chrome', keywords: ['chrome', 'chromium'] };
165
+ if (/\bmozilla\s+firefox\b/i.test(text) || /\bfirefox\b/i.test(text)) return { name: 'firefox', keywords: ['firefox'] };
166
+ if (/\bbrave\b/i.test(text)) return { name: 'brave', keywords: ['brave'] };
167
+ if (/\bvivaldi\b/i.test(text)) return { name: 'vivaldi', keywords: ['vivaldi'] };
168
+ if (/\bopera\b/i.test(text)) return { name: 'opera', keywords: ['opera'] };
169
+ return null;
170
+ })();
171
+
172
+ // Infer intent
173
+ const intent = (() => {
174
+ if (/\bclose\b/.test(lower) && /\btab\b/.test(lower)) return 'close_tab';
175
+ if (/\bclose\b/.test(lower) && /\bwindow\b/.test(lower)) return 'close_window';
176
+ if (appHints.isBrowser && (urlCandidate || searchQuery)) return 'browser_navigate';
177
+ if (appHints.isBrowser && /\b(new\s+tab|open\s+tab|ctrl\+t|ctrl\+l|navigate|go\s+to|visit|open\s+youtube|youtube\.com|search\s+for|search)\b/i.test(text)) return 'browser_navigate';
178
+ if (/\bclick\b/.test(lower)) return 'click';
179
+ if (/\btype\b/.test(lower) || /\benter\b/.test(lower)) return 'type';
180
+ if (/\bscroll\b/.test(lower)) return 'scroll';
181
+ if (/\bdrag\b/.test(lower)) return 'drag';
182
+ if (/\bfind\b/.test(lower) || /\blocate\b/.test(lower)) return 'find';
183
+ if (/\bfocus\b/.test(lower) || /\bactivate\b/.test(lower) || /\bbring\b/.test(lower)) return 'focus';
184
+ return 'unknown';
185
+ })();
186
+
187
+ const quoted = extractQuotedStrings(text);
188
+
189
+ // Potential element text is often quoted, but avoid using the tab title as element text.
190
+ const elementTextCandidates = quoted.filter(q => q && q !== tabTitle);
191
+
192
+ return {
193
+ raw: text,
194
+ intent,
195
+ windowHint,
196
+ tabTitle,
197
+ appHints,
198
+ elementTextCandidates,
199
+ wantsNewTab,
200
+ urlCandidate,
201
+ searchQuery,
202
+ requestedBrowser,
203
+ wantsIntegratedBrowser,
204
+ };
205
+ }
206
+
207
+ function isLikelyBrowserWindow(win) {
208
+ const title = win?.title || '';
209
+ const proc = win?.processName || '';
210
+ return (
211
+ includesCI(proc, 'msedge') || includesCI(title, 'edge') ||
212
+ includesCI(proc, 'chrome') || includesCI(title, 'chrome') ||
213
+ includesCI(proc, 'firefox') || includesCI(title, 'firefox') ||
214
+ includesCI(proc, 'brave') || includesCI(title, 'brave') ||
215
+ includesCI(proc, 'opera') || includesCI(title, 'opera') ||
216
+ includesCI(proc, 'vivaldi') || includesCI(title, 'vivaldi')
217
+ );
218
+ }
219
+
220
+ function isLikelyVSCodeWindow(win) {
221
+ const title = win?.title || '';
222
+ const proc = win?.processName || '';
223
+ return (
224
+ includesCI(proc, 'Code') || includesCI(proc, 'Code - Insiders') ||
225
+ includesCI(title, 'Visual Studio Code')
226
+ );
227
+ }
228
+
229
+ function isLocalhostUrl(urlish) {
230
+ const u = normalizeText(urlish);
231
+ if (!u) return false;
232
+ return /^(https?:\/\/)?(localhost|127\.0\.0\.1)(:\d+)?(\/|$)/i.test(u);
233
+ }
234
+
235
+ function scoreWindowCandidate(win, hints) {
236
+ let score = 0;
237
+ const reasons = [];
238
+
239
+ const title = win?.title || '';
240
+ const proc = win?.processName || '';
241
+
242
+ if (hints.windowHint && includesCI(title, hints.windowHint)) {
243
+ score += 60;
244
+ reasons.push('title matches windowHint');
245
+ }
246
+
247
+ const looksLikeBrowser = isLikelyBrowserWindow(win);
248
+
249
+ if (hints.appHints?.isBrowser && looksLikeBrowser) {
250
+ score += 35;
251
+ reasons.push('looks like browser');
252
+ }
253
+
254
+ if (hints.requestedBrowser?.keywords?.length) {
255
+ const matchesPreferred = hints.requestedBrowser.keywords.some(k => includesCI(proc, k) || includesCI(title, k));
256
+ if (matchesPreferred) {
257
+ score += 25;
258
+ reasons.push(`matches requested browser (${hints.requestedBrowser.name})`);
259
+ }
260
+ }
261
+ if (hints.appHints?.isEditor && (includesCI(title, 'visual studio code') || includesCI(title, 'code - insiders') || includesCI(proc, 'Code') || includesCI(proc, 'Code - Insiders'))) {
262
+ score += 35;
263
+ reasons.push('looks like editor');
264
+ }
265
+ if (hints.appHints?.isTerminal && (includesCI(title, 'terminal') || includesCI(proc, 'WindowsTerminal') || includesCI(proc, 'pwsh') || includesCI(proc, 'cmd'))) {
266
+ score += 30;
267
+ reasons.push('looks like terminal');
268
+ }
269
+ if (hints.appHints?.isExplorer && (includesCI(proc, 'explorer') || includesCI(title, 'file explorer'))) {
270
+ score += 30;
271
+ reasons.push('looks like explorer');
272
+ }
273
+
274
+ // Prefer non-empty titled windows
275
+ if (normalizeText(title).length > 0) {
276
+ score += 3;
277
+ }
278
+
279
+ return { score, reasons };
280
+ }
281
+
282
+ function buildSuggestedPlan(hints, activeWindow, rankedCandidates) {
283
+ const windowsRanked = Array.isArray(rankedCandidates) ? rankedCandidates.map(c => c.window).filter(Boolean) : [];
284
+ const browserWindowsRanked = windowsRanked.filter(isLikelyBrowserWindow);
285
+ const vsCodeWindowsRanked = windowsRanked.filter(isLikelyVSCodeWindow);
286
+
287
+ const target = (() => {
288
+ // If the user explicitly wants the VS Code integrated browser, target VS Code.
289
+ if (hints.wantsIntegratedBrowser) {
290
+ if (vsCodeWindowsRanked[0]) return vsCodeWindowsRanked[0];
291
+ if (activeWindow && isLikelyVSCodeWindow(activeWindow)) return activeWindow;
292
+ return windowsRanked[0] || activeWindow || null;
293
+ }
294
+
295
+ // For browser actions, never target an arbitrary non-browser window.
296
+ if (hints.intent === 'browser_navigate' && hints.appHints?.isBrowser) {
297
+ if (hints.requestedBrowser?.keywords?.length) {
298
+ const preferred = browserWindowsRanked.find(w => hints.requestedBrowser.keywords.some(k => includesCI(w?.processName || '', k) || includesCI(w?.title || '', k)));
299
+ if (preferred) return preferred;
300
+ }
301
+
302
+ // Fallback to any detected browser window, else the active window if it is a browser.
303
+ if (browserWindowsRanked[0]) return browserWindowsRanked[0];
304
+ if (activeWindow && isLikelyBrowserWindow(activeWindow)) return activeWindow;
305
+ return null;
306
+ }
307
+
308
+ // Non-browser intents: use ranking, then active window.
309
+ return windowsRanked[0] || activeWindow || null;
310
+ })();
311
+ const plan = [];
312
+
313
+ const targetTitleForFilter = target?.title ? String(target.title) : null;
314
+
315
+ const targetSelector = (() => {
316
+ if (!target) return null;
317
+ if (typeof target.hwnd === 'number' && Number.isFinite(target.hwnd)) {
318
+ return { by: 'hwnd', value: target.hwnd };
319
+ }
320
+ if (target.title) {
321
+ return { by: 'title', value: target.title };
322
+ }
323
+ return null;
324
+ })();
325
+
326
+ // State machine-ish scaffold. Keep it deterministic and CLI-driven.
327
+ plan.push({
328
+ state: 'VERIFY_ACTIVE_WINDOW',
329
+ goal: 'Confirm which window will receive input',
330
+ command: 'liku window --active',
331
+ verification: 'Active window title/process match the intended target',
332
+ });
333
+
334
+ if (targetSelector && hints.intent !== 'unknown') {
335
+ const frontCmd = targetSelector.by === 'hwnd'
336
+ ? `liku window --front --hwnd ${targetSelector.value}`
337
+ : `liku window --front "${String(targetSelector.value).replace(/"/g, '\\"')}"`;
338
+
339
+ plan.unshift({
340
+ state: 'FOCUS_TARGET_WINDOW',
341
+ goal: 'Bring the intended target window to the foreground',
342
+ command: frontCmd,
343
+ verification: 'Window is foreground and becomes active',
344
+ });
345
+ }
346
+
347
+ // Tab targeting for browsers is always a separate step.
348
+ if (hints.intent === 'close_tab' && hints.tabTitle) {
349
+ const windowFilter = targetTitleForFilter ? ` --window "${targetTitleForFilter.replace(/"/g, '\\"')}"` : '';
350
+ plan.push({
351
+ state: 'ACTIVATE_TARGET_TAB',
352
+ goal: `Make the tab active: "${hints.tabTitle}"`,
353
+ command: `liku click "${String(hints.tabTitle).replace(/"/g, '\\"')}" --type TabItem${windowFilter}`,
354
+ verification: 'The tab becomes active (visually highlighted)',
355
+ notes: 'If UIA cannot see browser tabs, fall back to ctrl+1..9 or ctrl+tab cycling with waits.',
356
+ });
357
+ plan.push({
358
+ state: 'EXECUTE_ACTION',
359
+ goal: 'Close the active tab',
360
+ command: 'liku keys ctrl+w',
361
+ verification: 'Tab disappears; previous tab becomes active',
362
+ });
363
+ return { target, plan };
364
+ }
365
+
366
+ if (hints.intent === 'browser_navigate' && hints.appHints?.isBrowser) {
367
+ // If running inside VS Code and the user wants it, prefer using the Integrated Browser.
368
+ if (hints.wantsIntegratedBrowser) {
369
+ const url = toHttpsUrl(hints.urlCandidate) || buildSearchUrl({ query: hints.searchQuery, preferYouTube: false });
370
+ const localhostish = isLocalhostUrl(hints.urlCandidate);
371
+
372
+ plan.push({
373
+ state: 'OPEN_INTEGRATED_BROWSER',
374
+ goal: 'Open VS Code Integrated Browser',
375
+ command: 'liku keys ctrl+shift+p',
376
+ verification: 'Command Palette opens',
377
+ notes: 'Run the VS Code command: "Browser: Open Integrated Browser"',
378
+ });
379
+ plan.push({
380
+ state: 'COMMAND_INTEGRATED_BROWSER',
381
+ goal: 'Run the Integrated Browser command',
382
+ command: 'liku type "Browser: Open Integrated Browser"',
383
+ verification: 'The command appears in the palette',
384
+ });
385
+ plan.push({
386
+ state: 'CONFIRM_COMMAND',
387
+ goal: 'Execute the command',
388
+ command: 'liku keys enter',
389
+ verification: 'An Integrated Browser editor tab opens',
390
+ notes: localhostish
391
+ ? 'Tip: enable the VS Code setting workbench.browser.openLocalhostLinks to automatically open localhost links in the integrated browser.'
392
+ : 'Integrated Browser supports http(s) and file URLs.',
393
+ });
394
+
395
+ if (localhostish) {
396
+ plan.push({
397
+ state: 'OPEN_SETTINGS',
398
+ goal: 'Open VS Code Settings (optional)',
399
+ command: 'liku keys ctrl+,',
400
+ verification: 'Settings UI opens',
401
+ });
402
+ plan.push({
403
+ state: 'FIND_SETTING',
404
+ goal: 'Locate the localhost-integrated-browser setting',
405
+ command: 'liku type "workbench.browser.openLocalhostLinks"',
406
+ verification: 'The setting appears in search results',
407
+ notes: 'Enable it to route localhost links to the Integrated Browser.',
408
+ });
409
+ plan.push({
410
+ state: 'VERIFY_SETTING',
411
+ goal: 'Capture evidence of the setting state',
412
+ command: 'liku screenshot',
413
+ verification: 'Screenshot shows the setting and whether it is enabled',
414
+ });
415
+ }
416
+
417
+ if (url) {
418
+ plan.push({
419
+ state: 'FOCUS_ADDRESS_BAR',
420
+ goal: 'Focus the integrated browser address bar',
421
+ command: 'liku keys ctrl+l',
422
+ verification: 'Address bar is focused (URL text highlighted)',
423
+ });
424
+ plan.push({
425
+ state: 'TYPE_URL',
426
+ goal: 'Type the destination URL',
427
+ command: `liku type "${escapeDoubleQuotes(url)}"`,
428
+ verification: 'The full URL appears correctly in the address bar',
429
+ });
430
+ plan.push({
431
+ state: 'NAVIGATE',
432
+ goal: 'Navigate to the URL in the integrated browser',
433
+ command: 'liku keys enter',
434
+ verification: 'Page begins loading; content changes',
435
+ });
436
+ } else {
437
+ plan.push({
438
+ state: 'MISSING_URL',
439
+ goal: 'No URL could be inferred from the request',
440
+ command: 'liku screenshot',
441
+ verification: 'Use the screenshot to decide the next navigation step',
442
+ });
443
+ }
444
+
445
+ plan.push({
446
+ state: 'VERIFY_RESULT',
447
+ goal: 'Capture evidence of the resulting page state',
448
+ command: 'liku screenshot',
449
+ verification: 'Screenshot shows expected page state in the integrated browser',
450
+ });
451
+
452
+ return { target, plan };
453
+ }
454
+
455
+ if (!target) {
456
+ plan.push({
457
+ state: 'NO_BROWSER_WINDOW',
458
+ goal: 'No browser window was detected; open a browser window first',
459
+ command: 'liku window',
460
+ verification: 'A browser window (Edge/Chrome/Firefox/Brave/etc) appears in the list',
461
+ });
462
+ return { target: null, plan };
463
+ }
464
+
465
+ // Prefer deterministic in-window navigation over process launch.
466
+ const preferYouTube = /\byoutube\b/i.test(hints.raw || '') || /youtube\.com/i.test(hints.raw || '');
467
+ const url = (
468
+ toHttpsUrl(hints.urlCandidate) ||
469
+ buildSearchUrl({ query: hints.searchQuery, preferYouTube })
470
+ );
471
+
472
+ if (hints.wantsNewTab) {
473
+ plan.push({
474
+ state: 'OPEN_NEW_TAB',
475
+ goal: 'Open a new tab in the focused browser window',
476
+ command: 'liku keys ctrl+t',
477
+ verification: 'A new tab opens (tab count increases or blank tab appears)',
478
+ });
479
+ }
480
+
481
+ plan.push({
482
+ state: 'FOCUS_ADDRESS_BAR',
483
+ goal: 'Focus the address bar',
484
+ command: 'liku keys ctrl+l',
485
+ verification: 'Address bar is focused (URL text highlighted)',
486
+ notes: 'If focus is flaky, re-run `liku window --active` and re-focus the browser window before sending keys.',
487
+ });
488
+
489
+ if (url) {
490
+ plan.push({
491
+ state: 'TYPE_URL',
492
+ goal: `Type the destination URL${hints.searchQuery ? ' (search encoded into URL for reliability)' : ''}`,
493
+ command: `liku type "${escapeDoubleQuotes(url)}"`,
494
+ verification: 'The full URL appears correctly in the address bar',
495
+ notes: 'If characters drop: ctrl+l → ctrl+a → type URL again → enter (with short pauses).',
496
+ });
497
+ plan.push({
498
+ state: 'NAVIGATE',
499
+ goal: 'Navigate to the URL in the current tab',
500
+ command: 'liku keys enter',
501
+ verification: 'Page begins loading; title/content changes',
502
+ });
503
+ } else {
504
+ plan.push({
505
+ state: 'MISSING_URL',
506
+ goal: 'No URL could be inferred from the request',
507
+ command: 'liku screenshot',
508
+ verification: 'Use the screenshot to decide the next navigation step',
509
+ });
510
+ }
511
+
512
+ plan.push({
513
+ state: 'VERIFY_FOCUS',
514
+ goal: 'Verify keyboard focus stayed on the browser window',
515
+ command: 'liku window --active',
516
+ verification: hints.requestedBrowser?.name
517
+ ? `Active window process/title matches the requested browser (${hints.requestedBrowser.name})`
518
+ : 'Active window process/title matches a browser window',
519
+ });
520
+
521
+ plan.push({
522
+ state: 'VERIFY_RESULT',
523
+ goal: 'Capture evidence of the resulting page state',
524
+ command: 'liku screenshot',
525
+ verification: 'Screenshot shows expected page (e.g., YouTube results for query)',
526
+ });
527
+
528
+ return { target, plan };
529
+ }
530
+
531
+ if (hints.intent === 'close_window') {
532
+ plan.push({
533
+ state: 'EXECUTE_ACTION',
534
+ goal: 'Close the active window',
535
+ command: 'liku keys alt+f4',
536
+ verification: 'Window closes and focus changes',
537
+ notes: 'Prefer alt+f4 for closing windows; ctrl+shift+w is app-specific and can close the wrong thing.',
538
+ });
539
+ return { target, plan };
540
+ }
541
+
542
+ if (hints.intent === 'click') {
543
+ const elementText = hints.elementTextCandidates?.[0] || null;
544
+ if (elementText) {
545
+ const windowFilter = targetTitleForFilter ? ` --window "${targetTitleForFilter.replace(/"/g, '\\"')}"` : '';
546
+ plan.push({
547
+ state: 'EXECUTE_ACTION',
548
+ goal: `Click element: "${elementText}"`,
549
+ command: `liku click "${String(elementText).replace(/"/g, '\\"')}"${windowFilter}`,
550
+ verification: 'Expected UI response occurs (button press, navigation, etc.)',
551
+ });
552
+ }
553
+ return { target, plan };
554
+ }
555
+
556
+ // Generic fallback: ensure focus + suggest next step.
557
+ plan.push({
558
+ state: 'NEXT',
559
+ goal: 'If the target is not correct, refine the window hint and retry',
560
+ command: 'liku window # list windows',
561
+ verification: 'You can identify the intended window title/process',
562
+ });
563
+
564
+ return { target, plan };
565
+ }
566
+
567
+ function mermaidForPlan(plan) {
568
+ if (!Array.isArray(plan) || plan.length === 0) return null;
569
+ const ids = plan.map(p => p.state);
570
+ const edges = [];
571
+ for (let i = 0; i < ids.length - 1; i++) {
572
+ edges.push(`${ids[i]} --> ${ids[i + 1]}`);
573
+ }
574
+ return `stateDiagram-v2\n ${edges.join('\n ')}`;
575
+ }
576
+
577
+ function buildChecks({ uiaError, activeWindow, windows, requestText, requestHints, requestAnalysis }) {
578
+ const checks = [];
579
+ const push = (id, status, message, details = null) => {
580
+ checks.push({ id, status, message, details });
581
+ };
582
+
583
+ push(
584
+ 'uia.available',
585
+ uiaError ? 'fail' : 'pass',
586
+ uiaError ? 'UI Automation unavailable or errored' : 'UI Automation available',
587
+ uiaError ? { error: uiaError } : null
588
+ );
589
+
590
+ push(
591
+ 'ui.activeWindow.present',
592
+ activeWindow ? 'pass' : 'warn',
593
+ activeWindow ? 'Active window detected' : 'Active window missing',
594
+ activeWindow ? { title: activeWindow.title, processName: activeWindow.processName, hwnd: activeWindow.hwnd } : null
595
+ );
596
+
597
+ push(
598
+ 'ui.windows.enumerated',
599
+ Array.isArray(windows) && windows.length > 0 ? 'pass' : 'warn',
600
+ Array.isArray(windows) && windows.length > 0 ? `Enumerated ${windows.length} windows` : 'No windows enumerated',
601
+ Array.isArray(windows) ? { count: windows.length } : { count: 0 }
602
+ );
603
+
604
+ if (requestText) {
605
+ push(
606
+ 'request.parsed',
607
+ requestHints ? 'pass' : 'fail',
608
+ requestHints ? 'Request parsed into hints' : 'Request parsing failed',
609
+ requestHints || null
610
+ );
611
+ push(
612
+ 'request.plan.generated',
613
+ requestAnalysis?.plan?.length ? 'pass' : 'warn',
614
+ requestAnalysis?.plan?.length ? `Generated ${requestAnalysis.plan.length} plan steps` : 'No plan steps generated',
615
+ requestAnalysis?.plan?.length ? { steps: requestAnalysis.plan.map(s => s.state) } : null
616
+ );
617
+ }
618
+
619
+ return checks;
620
+ }
621
+
622
+ function summarizeChecks(checks) {
623
+ const summary = { pass: 0, warn: 0, fail: 0 };
624
+ for (const c of checks) {
625
+ if (c.status === 'pass') summary.pass += 1;
626
+ else if (c.status === 'warn') summary.warn += 1;
627
+ else if (c.status === 'fail') summary.fail += 1;
628
+ }
629
+ return summary;
630
+ }
631
+
632
+ async function run(args, options) {
633
+ // Load package metadata from the resolved project root (this is the key signal
634
+ // for "am I running the local install or some other copy?")
635
+ let pkg;
636
+ try {
637
+ pkg = require(path.join(PROJECT_ROOT, 'package.json'));
638
+ } catch (e) {
639
+ if (!options.quiet) {
640
+ error(`Failed to load package.json from ${PROJECT_ROOT}: ${e.message}`);
641
+ }
642
+ return { success: false, error: 'Could not load package metadata', projectRoot: PROJECT_ROOT };
643
+ }
644
+
645
+ const generatedAt = new Date().toISOString();
646
+
647
+ const envInfo = {
648
+ name: pkg.name,
649
+ version: pkg.version,
650
+ projectRoot: PROJECT_ROOT,
651
+ cwd: process.cwd(),
652
+ node: process.version,
653
+ platform: process.platform,
654
+ arch: process.arch,
655
+ execPath: process.execPath,
656
+ };
657
+
658
+ const requestText = args.length > 0 ? args.join(' ') : null;
659
+ const requestHints = requestText ? parseRequestHints(requestText) : null;
660
+
661
+ // UIA / active window + other state
662
+ let activeWindow = null;
663
+ let windows = [];
664
+ let mouse = null;
665
+ let uiaError = null;
666
+ await withConsoleSilenced(Boolean(options.json), async () => {
667
+ try {
668
+ // Lazy load so doctor still works even if UIA deps are missing
669
+ // (we'll just report that in output)
670
+ // eslint-disable-next-line global-require, import/no-dynamic-require
671
+ const ui = require(UI_MODULE);
672
+ activeWindow = await ui.getActiveWindow();
673
+ mouse = await ui.getMousePosition();
674
+
675
+ // Keep window lists bounded by default.
676
+ const maxWindows = options.all ? Number.MAX_SAFE_INTEGER : (options.windows ? parseInt(options.windows, 10) : 15);
677
+ const allWindows = await ui.findWindows({});
678
+ windows = Array.isArray(allWindows) ? allWindows.slice(0, maxWindows) : [];
679
+
680
+ if (!activeWindow) {
681
+ uiaError = 'No active window detected';
682
+ }
683
+ } catch (e) {
684
+ uiaError = e.message;
685
+ }
686
+ });
687
+
688
+ // Candidate targeting analysis (optional)
689
+ let requestAnalysis = null;
690
+ if (requestHints) {
691
+ const candidates = (Array.isArray(windows) ? windows : []).map(w => {
692
+ const { score, reasons } = scoreWindowCandidate(w, requestHints);
693
+ return { score, reasons, window: w };
694
+ }).sort((a, b) => b.score - a.score);
695
+
696
+ const { target, plan } = buildSuggestedPlan(requestHints, activeWindow, candidates);
697
+ requestAnalysis = {
698
+ request: requestHints,
699
+ target,
700
+ candidates: candidates.slice(0, 8).map(c => ({ score: c.score, reasons: c.reasons, window: c.window })),
701
+ plan,
702
+ mermaid: options.flow ? mermaidForPlan(plan) : null,
703
+ };
704
+ }
705
+
706
+ const checks = buildChecks({ uiaError, activeWindow, windows, requestText, requestHints, requestAnalysis });
707
+ const checksSummary = summarizeChecks(checks);
708
+ const ok = checksSummary.fail === 0;
709
+
710
+ const report = {
711
+ schemaVersion: DOCTOR_SCHEMA_VERSION,
712
+ generatedAt,
713
+ ok,
714
+ checks,
715
+ checksSummary,
716
+ env: envInfo,
717
+ request: requestText ? { text: requestText, hints: requestHints } : null,
718
+ uiState: {
719
+ activeWindow,
720
+ windows,
721
+ mouse,
722
+ uiaError: uiaError || null,
723
+ },
724
+ targeting: requestAnalysis ? {
725
+ selectedWindow: requestAnalysis.target || null,
726
+ candidates: requestAnalysis.candidates || [],
727
+ } : null,
728
+ plan: requestAnalysis ? {
729
+ steps: requestAnalysis.plan || [],
730
+ mermaid: requestAnalysis.mermaid || null,
731
+ } : null,
732
+ next: {
733
+ commands: (
734
+ requestAnalysis?.plan?.length
735
+ ? requestAnalysis.plan.map(s => s.command).filter(Boolean)
736
+ : ['liku window --active', 'liku window']
737
+ ),
738
+ },
739
+ };
740
+
741
+ if (options.json) {
742
+ // Caller wants machine-readable output
743
+ return report;
744
+ }
745
+
746
+ if (!options.quiet) {
747
+ console.log(`\n${highlight('Liku Diagnostics (doctor)')}\n`);
748
+
749
+ console.log(`${highlight('Package:')} ${envInfo.name} v${envInfo.version}`);
750
+ console.log(`${highlight('Resolved root:')} ${envInfo.projectRoot}`);
751
+ console.log(`${highlight('Node:')} ${envInfo.node} (${envInfo.platform}/${envInfo.arch})`);
752
+ console.log(`${highlight('CWD:')} ${envInfo.cwd}`);
753
+
754
+ console.log(`${highlight('Schema:')} ${DOCTOR_SCHEMA_VERSION}`);
755
+ console.log(`${highlight('OK:')} ${ok ? 'true' : 'false'} ${dim(`(pass=${checksSummary.pass} warn=${checksSummary.warn} fail=${checksSummary.fail})`)}`);
756
+
757
+ console.log(`\n${highlight('Active window:')}`);
758
+ if (activeWindow) {
759
+ const bounds = activeWindow.bounds || { x: '?', y: '?', width: '?', height: '?' };
760
+ console.log(` Title: ${activeWindow.title || dim('(unknown)')}`);
761
+ console.log(` Process: ${activeWindow.processName || dim('(unknown)')}`);
762
+ console.log(` Class: ${activeWindow.className || dim('(unknown)')}`);
763
+ console.log(` Handle: ${activeWindow.hwnd ?? dim('(unknown)')}`);
764
+ console.log(` Bounds: ${bounds.x},${bounds.y} ${bounds.width}x${bounds.height}`);
765
+ } else {
766
+ error(`Could not read active window (${uiaError || 'unknown error'})`);
767
+ info('Tip: try running `liku window --active` to confirm UI Automation is working.');
768
+ }
769
+
770
+ if (mouse) {
771
+ console.log(`\n${highlight('Mouse:')} ${mouse.x},${mouse.y}`);
772
+ }
773
+
774
+ if (Array.isArray(windows) && windows.length > 0) {
775
+ console.log(`\n${highlight(`Top windows (${windows.length}${options.all ? '' : ' shown'}):`)}`);
776
+ windows.slice(0, 10).forEach((w, idx) => {
777
+ const title = w.title || '(untitled)';
778
+ const proc = w.processName || '-';
779
+ const hwnd = w.hwnd ?? '?';
780
+ console.log(` ${idx + 1}. [${hwnd}] ${title} ${dim('—')} ${proc}`);
781
+ });
782
+ if (windows.length > 10) {
783
+ console.log(dim(' (Use --windows <n> or --all with --json for more)'));
784
+ }
785
+ }
786
+
787
+ // Helpful next-step hints for browser operations
788
+ console.log(`\n${highlight('Targeting tips:')}`);
789
+ console.log(` - Before sending keys, ensure the intended app is active.`);
790
+ console.log(` - For browsers: activate the correct tab first, then use ${highlight('ctrl+w')} to close the active tab.`);
791
+
792
+ if (requestAnalysis?.plan?.length) {
793
+ console.log(`\n${highlight('Suggested plan:')}`);
794
+ requestAnalysis.plan.forEach((step, i) => {
795
+ console.log(` ${i + 1}. ${highlight(step.state)}: ${step.command}`);
796
+ });
797
+ if (options.flow && requestAnalysis.mermaid) {
798
+ console.log(`\n${highlight('Flow (Mermaid):')}\n${requestAnalysis.mermaid}`);
799
+ }
800
+ }
801
+
802
+ // For debugging copy/paste
803
+ if (options.debug) {
804
+ const json = safeJsonStringify(report);
805
+ if (json) {
806
+ console.log(`\n${highlight('Raw JSON:')}\n${json}`);
807
+ }
808
+ }
809
+
810
+ if (ok) success('Doctor check OK');
811
+ }
812
+
813
+ return report;
814
+ }
815
+
816
+ module.exports = { run };
package/src/cli/liku.js CHANGED
@@ -1,4 +1,4 @@
1
- #!/usr/bin/env node
1
+ #!/usr/bin/env node
2
2
  /**
3
3
  * liku - Copilot-Liku CLI
4
4
  *
@@ -36,6 +36,7 @@ const pkg = require(path.join(PROJECT_ROOT, 'package.json'));
36
36
  // Command registry
37
37
  const COMMANDS = {
38
38
  start: { desc: 'Start the Electron agent with overlay', file: 'start' },
39
+ doctor: { desc: 'Diagnostics: version, environment, active window', file: 'doctor' },
39
40
  click: { desc: 'Click element by text or coordinates', file: 'click', args: '<text|x,y>' },
40
41
  find: { desc: 'Find UI elements matching criteria', file: 'find', args: '<text>' },
41
42
  type: { desc: 'Type text at current cursor position', file: 'type', args: '<text>' },
@@ -486,10 +486,64 @@ function getPlatformContext() {
486
486
  - **Screenshot**: \`win+shift+s\`
487
487
 
488
488
  ### Windows Terminal Shortcuts
489
- - **New tab**: \`ctrl+shift+t\`
490
- - **Close tab**: \`ctrl+shift+w\`
489
+ - (Windows Terminal only) **New tab**: \`ctrl+shift+t\`
490
+ - (Windows Terminal only) **Close tab**: \`ctrl+shift+w\`
491
491
  - **Split pane**: \`alt+shift+d\`
492
492
 
493
+ ### Browser Tab Shortcuts (Edge/Chrome)
494
+ - **New tab**: \`ctrl+t\`
495
+ - **Close tab**: \`ctrl+w\`
496
+ - **Reopen closed tab**: \`ctrl+shift+t\`
497
+ - **Close window**: \`ctrl+shift+w\`
498
+
499
+ ### Browser Automation Policy (Robust)
500
+ When the user asks to **use an existing browser window/tab** (Edge/Chrome), prefer **in-window control** (focus + keys) instead of launching processes.
501
+
502
+ - **DO NOT** use PowerShell COM \`SendKeys\` or \`Start-Process msedge\` / \`microsoft-edge:\` to control an existing tab. These are unreliable and may open new windows/tabs unexpectedly.
503
+ - **DO** use Liku actions: \`bring_window_to_front\` / \`focus_window\` + \`key\` + \`type\` + \`wait\`.
504
+ - **Chain the whole flow in one action block** so focus is maintained; avoid pausing for manual validation.
505
+
506
+ **Reliable recipes:**
507
+ - **Open a new tab in the existing Edge/Chrome window**:
508
+ 1) bring window to front
509
+ 2) wait 300–800ms
510
+ 3) \`ctrl+t\`
511
+ 4) wait 200–500ms
512
+ - **Navigate the current tab to a URL**:
513
+ 1) \`ctrl+l\` (address bar)
514
+ 2) wait 150–300ms
515
+ 3) type full URL (prefer \`https://...\`)
516
+ 4) \`enter\`
517
+ 5) wait 2000–5000ms (page load)
518
+ - **Self-heal if text drops/mis-types**: \`ctrl+l\` → \`ctrl+a\` → type again → \`enter\` (add waits)
519
+ - **YouTube search (keyboard-first)**: press \`/\` to focus search → type query → \`enter\` → wait
520
+
521
+ **Verification guidance:**
522
+ - If unsure whether the right window/tab is active, take a quick \`screenshot\` and proceed only when the browser is clearly focused.
523
+ - Validate major state changes (after focus, after navigation, after submitting search). If validation fails, retry focus + navigation (bounded retries).
524
+
525
+ ### Focus Rule (CRITICAL)
526
+ Before sending keyboard shortcuts, make sure the intended app window is focused.
527
+ If the overlay/chat has focus, shortcuts like \`ctrl+w\` / \`ctrl+shift+w\` may close the overlay instead of the target app.
528
+
529
+ ### Target Verification (CRITICAL)
530
+ - For any action that affects a specific app (especially browsers), **verify the active window is correct before executing**.
531
+ - Prefer this sequence:
532
+ 1) Bring the target window to front (e.g., Edge)
533
+ 2) Confirm active window (title/process)
534
+ 3) Only then send keys/clicks
535
+ - If unsure, take a screenshot for confirmation.
536
+
537
+ ### Browser Tab Targeting (Edge/Chrome)
538
+ - You generally **cannot safely close a specific tab by title** unless you first make that tab active.
539
+ - Prefer:
540
+ 1) Focus Edge/Chrome window
541
+ 2) Activate the tab by clicking its title in the tab strip (UIA or coordinate click)
542
+ 3) Then close tab with \`ctrl+w\`
543
+ - If the tab title is not discoverable via UI Automation, use keyboard strategies:
544
+ - \`ctrl+1..8\` switch to tab 1..8, \`ctrl+9\` switches to last tab
545
+ - \`ctrl+tab\` / \`ctrl+shift+tab\` cycle tabs (add waits)
546
+
493
547
  ### IMPORTANT: On Windows, NEVER use:
494
548
  - \`cmd+space\` (that's macOS Spotlight)
495
549
  - \`ctrl+alt+t\` (that's Linux terminal shortcut)`;
@@ -1892,6 +1946,27 @@ function analyzeActionSafety(action, targetInfo = {}) {
1892
1946
  case 'key':
1893
1947
  // Analyze key combinations
1894
1948
  const key = (action.key || '').toLowerCase();
1949
+ const keyNorm = key.replace(/\s+/g, '');
1950
+
1951
+ // Treat window/tab/app-close shortcuts as HIGH risk: they can instantly close the overlay,
1952
+ // the active terminal tab/window, a browser window, or dismiss important dialogs.
1953
+ // Require explicit confirmation so smaller models can't accidentally "self-close" the UI.
1954
+ const closeCombos = [
1955
+ 'alt+f4',
1956
+ 'ctrl+w',
1957
+ 'ctrl+shift+w',
1958
+ 'ctrl+q',
1959
+ 'ctrl+shift+q',
1960
+ 'cmd+w',
1961
+ 'cmd+q',
1962
+ ];
1963
+ if (closeCombos.includes(keyNorm)) {
1964
+ result.riskLevel = ActionRiskLevel.CRITICAL;
1965
+ result.warnings.push(`Close shortcut detected: ${action.key}`);
1966
+ result.requiresConfirmation = true;
1967
+ break;
1968
+ }
1969
+
1895
1970
  if (key.includes('delete') || key.includes('backspace')) {
1896
1971
  result.riskLevel = ActionRiskLevel.HIGH;
1897
1972
  result.warnings.push('Delete/Backspace key may remove content');
@@ -2126,9 +2201,21 @@ async function executeActions(actionData, onAction = null, onScreenshot = null,
2126
2201
  const results = [];
2127
2202
  let screenshotRequested = false;
2128
2203
  let pendingConfirmation = false;
2204
+ let lastTargetWindowHandle = null;
2129
2205
 
2130
2206
  for (let i = 0; i < actionData.actions.length; i++) {
2131
2207
  const action = actionData.actions[i];
2208
+
2209
+ // Track the intended target window across steps so later key/type actions can
2210
+ // re-focus it. Without this, focus can drift back to the overlay/terminal.
2211
+ if (action.type === 'focus_window' || action.type === 'bring_window_to_front') {
2212
+ try {
2213
+ const hwnd = await systemAutomation.resolveWindowHandle(action);
2214
+ if (hwnd) {
2215
+ lastTargetWindowHandle = hwnd;
2216
+ }
2217
+ } catch {}
2218
+ }
2132
2219
 
2133
2220
  // Handle screenshot requests specially
2134
2221
  if (action.type === 'screenshot') {
@@ -2151,9 +2238,14 @@ async function executeActions(actionData, onAction = null, onScreenshot = null,
2151
2238
  // Analyze safety
2152
2239
  const safety = analyzeActionSafety(action, targetInfo);
2153
2240
  console.log(`[AI-SERVICE] Action ${i} safety: ${safety.riskLevel}`, safety.warnings);
2241
+
2242
+ // CRITICAL actions require an explicit confirmation step, even if the user clicked
2243
+ // the general "Execute" button for a batch. This prevents accidental destructive
2244
+ // shortcuts (e.g., alt+f4) from immediately closing the active app due to focus issues.
2245
+ const canBypassConfirmation = skipSafetyConfirmation && safety.riskLevel !== ActionRiskLevel.CRITICAL;
2154
2246
 
2155
2247
  // If HIGH or CRITICAL risk, require confirmation (unless user already confirmed via Execute button)
2156
- if (safety.requiresConfirmation && !skipSafetyConfirmation) {
2248
+ if (safety.requiresConfirmation && !canBypassConfirmation) {
2157
2249
  console.log(`[AI-SERVICE] Action ${i} requires user confirmation`);
2158
2250
 
2159
2251
  // Store as pending action
@@ -2176,7 +2268,11 @@ async function executeActions(actionData, onAction = null, onScreenshot = null,
2176
2268
  }
2177
2269
 
2178
2270
  if (skipSafetyConfirmation && safety.requiresConfirmation) {
2179
- console.log(`[AI-SERVICE] Action ${i} safety bypassed (user pre-confirmed via Execute button)`);
2271
+ if (canBypassConfirmation) {
2272
+ console.log(`[AI-SERVICE] Action ${i} safety bypassed (user pre-confirmed via Execute button)`);
2273
+ } else {
2274
+ console.log(`[AI-SERVICE] Action ${i} requires explicit confirmation (CRITICAL)`);
2275
+ }
2180
2276
  }
2181
2277
 
2182
2278
  // Execute the action (SAFE/LOW/MEDIUM risk)
@@ -2186,6 +2282,7 @@ async function executeActions(actionData, onAction = null, onScreenshot = null,
2186
2282
  if (uiWatcher && uiWatcher.isPolling) {
2187
2283
  const elementAtPoint = uiWatcher.getElementAtPoint(action.x, action.y);
2188
2284
  if (elementAtPoint && elementAtPoint.windowHandle) {
2285
+ lastTargetWindowHandle = elementAtPoint.windowHandle;
2189
2286
  // Found an element with a known window handle
2190
2287
  // Focus it first to ensure click goes to the right window (not trapped by overlay or obscuring window)
2191
2288
  // We can call systemAutomation.focusWindow directly
@@ -2196,11 +2293,35 @@ async function executeActions(actionData, onAction = null, onScreenshot = null,
2196
2293
  }
2197
2294
  }
2198
2295
 
2296
+ // Ensure keyboard input goes to the last known target window.
2297
+ if ((action.type === 'key' || action.type === 'type') && lastTargetWindowHandle) {
2298
+ console.log(`[AI-SERVICE] Re-focusing last target window ${lastTargetWindowHandle} before ${action.type}`);
2299
+ await systemAutomation.focusWindow(lastTargetWindowHandle);
2300
+ await new Promise(r => setTimeout(r, 125));
2301
+ }
2302
+
2199
2303
  const result = await (actionExecutor ? actionExecutor(action) : systemAutomation.executeAction(action));
2200
2304
  result.reason = action.reason || '';
2201
2305
  result.safety = safety;
2202
2306
  results.push(result);
2203
2307
 
2308
+ // If we just performed a step that likely changed focus, snapshot the actual foreground HWND.
2309
+ // This is especially important when uiWatcher isn't polling (can't infer windowHandle).
2310
+ if (typeof systemAutomation.getForegroundWindowHandle === 'function') {
2311
+ if (
2312
+ action.type === 'click' ||
2313
+ action.type === 'double_click' ||
2314
+ action.type === 'right_click' ||
2315
+ action.type === 'focus_window' ||
2316
+ action.type === 'bring_window_to_front'
2317
+ ) {
2318
+ const fg = await systemAutomation.getForegroundWindowHandle();
2319
+ if (fg) {
2320
+ lastTargetWindowHandle = fg;
2321
+ }
2322
+ }
2323
+ }
2324
+
2204
2325
  // Callback for UI updates
2205
2326
  if (onAction) {
2206
2327
  onAction(result, i, actionData.actions.length);
@@ -2242,10 +2363,20 @@ async function resumeAfterConfirmation(onAction = null, onScreenshot = null, opt
2242
2363
 
2243
2364
  const results = [...pending.completedResults];
2244
2365
  let screenshotRequested = false;
2366
+ let lastTargetWindowHandle = null;
2245
2367
 
2246
2368
  // Execute the confirmed action and remaining actions
2247
2369
  for (let i = 0; i < pending.remainingActions.length; i++) {
2248
2370
  const action = pending.remainingActions[i];
2371
+
2372
+ if (action.type === 'focus_window' || action.type === 'bring_window_to_front') {
2373
+ try {
2374
+ const hwnd = await systemAutomation.resolveWindowHandle(action);
2375
+ if (hwnd) {
2376
+ lastTargetWindowHandle = hwnd;
2377
+ }
2378
+ } catch {}
2379
+ }
2249
2380
 
2250
2381
  if (action.type === 'screenshot') {
2251
2382
  screenshotRequested = true;
@@ -2255,12 +2386,45 @@ async function resumeAfterConfirmation(onAction = null, onScreenshot = null, opt
2255
2386
  results.push({ success: true, action: 'screenshot', message: 'Screenshot captured' });
2256
2387
  continue;
2257
2388
  }
2389
+
2390
+ if ((action.type === 'click' || action.type === 'double_click' || action.type === 'right_click') && action.x !== undefined) {
2391
+ if (uiWatcher && uiWatcher.isPolling) {
2392
+ const elementAtPoint = uiWatcher.getElementAtPoint(action.x, action.y);
2393
+ if (elementAtPoint && elementAtPoint.windowHandle) {
2394
+ lastTargetWindowHandle = elementAtPoint.windowHandle;
2395
+ console.log(`[AI-SERVICE] (resume) Auto-focusing window handle ${elementAtPoint.windowHandle} for click at (${action.x}, ${action.y})`);
2396
+ await systemAutomation.focusWindow(elementAtPoint.windowHandle);
2397
+ await new Promise(r => setTimeout(r, 450));
2398
+ }
2399
+ }
2400
+ }
2401
+
2402
+ if ((action.type === 'key' || action.type === 'type') && lastTargetWindowHandle) {
2403
+ console.log(`[AI-SERVICE] (resume) Re-focusing last target window ${lastTargetWindowHandle} before ${action.type}`);
2404
+ await systemAutomation.focusWindow(lastTargetWindowHandle);
2405
+ await new Promise(r => setTimeout(r, 125));
2406
+ }
2258
2407
 
2259
2408
  // Execute action (user confirmed, skip safety for first action)
2260
2409
  const result = await (actionExecutor ? actionExecutor(action) : systemAutomation.executeAction(action));
2261
2410
  result.reason = action.reason || '';
2262
2411
  result.userConfirmed = i === 0; // First one was confirmed
2263
2412
  results.push(result);
2413
+
2414
+ if (typeof systemAutomation.getForegroundWindowHandle === 'function') {
2415
+ if (
2416
+ action.type === 'click' ||
2417
+ action.type === 'double_click' ||
2418
+ action.type === 'right_click' ||
2419
+ action.type === 'focus_window' ||
2420
+ action.type === 'bring_window_to_front'
2421
+ ) {
2422
+ const fg = await systemAutomation.getForegroundWindowHandle();
2423
+ if (fg) {
2424
+ lastTargetWindowHandle = fg;
2425
+ }
2426
+ }
2427
+ }
2264
2428
 
2265
2429
  if (onAction) {
2266
2430
  onAction(result, pending.actionIndex + i, pending.actionIndex + pending.remainingActions.length);
package/src/main/index.js CHANGED
@@ -1,3 +1,65 @@
1
+ function isBrokenPipeLikeError(err) {
2
+ const code = err && err.code;
3
+ return (
4
+ code === 'EPIPE' ||
5
+ code === 'ERR_STREAM_DESTROYED' ||
6
+ code === 'ERR_STREAM_WRITE_AFTER_END'
7
+ );
8
+ }
9
+
10
+ function patchConsoleForBrokenPipes() {
11
+ const methods = ['log', 'info', 'warn', 'error'];
12
+ const originals = {};
13
+ let stdioDisabled = false;
14
+
15
+ for (const method of methods) {
16
+ originals[method] = typeof console[method] === 'function'
17
+ ? console[method].bind(console)
18
+ : () => {};
19
+
20
+ console[method] = (...args) => {
21
+ if (stdioDisabled) return;
22
+ try {
23
+ originals[method](...args);
24
+ } catch (e) {
25
+ if (isBrokenPipeLikeError(e)) {
26
+ stdioDisabled = true;
27
+ return;
28
+ }
29
+ throw e;
30
+ }
31
+ };
32
+ }
33
+
34
+ const swallowStreamError = (stream) => {
35
+ if (!stream || typeof stream.on !== 'function') return;
36
+ stream.on('error', (e) => {
37
+ if (isBrokenPipeLikeError(e)) {
38
+ stdioDisabled = true;
39
+ return;
40
+ }
41
+ });
42
+ };
43
+
44
+ swallowStreamError(process.stdout);
45
+ swallowStreamError(process.stderr);
46
+ }
47
+
48
+ patchConsoleForBrokenPipes();
49
+
50
+ process.on('uncaughtException', (err) => {
51
+ if (isBrokenPipeLikeError(err)) {
52
+ return;
53
+ }
54
+ throw err;
55
+ });
56
+
57
+ process.on('unhandledRejection', (reason) => {
58
+ if (isBrokenPipeLikeError(reason)) {
59
+ return;
60
+ }
61
+ });
62
+
1
63
  // Ensure Electron runs in app mode even if a dev shell has ELECTRON_RUN_AS_NODE set
2
64
  if (process.env.ELECTRON_RUN_AS_NODE) {
3
65
  console.warn('ELECTRON_RUN_AS_NODE was set; clearing so the app can start normally.');
@@ -488,7 +488,24 @@ public class WindowFocus {
488
488
  [WindowFocus]::Focus([IntPtr]::new(${hwnd}))
489
489
  `;
490
490
  await executePowerShell(script);
491
- console.log(`[AUTOMATION] Focused window handle: ${hwnd}`);
491
+
492
+ // Poll to verify focus actually stuck (SetForegroundWindow can be racy / blocked)
493
+ let verified = false;
494
+ for (let attempt = 0; attempt < 10; attempt++) {
495
+ const fg = await getForegroundWindowHandle();
496
+ if (fg === hwnd) {
497
+ verified = true;
498
+ break;
499
+ }
500
+ await sleep(50);
501
+ }
502
+
503
+ if (verified) {
504
+ console.log(`[AUTOMATION] Focused window handle (verified): ${hwnd}`);
505
+ } else {
506
+ const fg = await getForegroundWindowHandle();
507
+ console.warn(`[AUTOMATION] Focus requested for ${hwnd} but foreground is ${fg}`);
508
+ }
492
509
  }
493
510
 
494
511
  /**
@@ -1698,6 +1715,29 @@ public class WindowInfo {
1698
1715
  return await executePowerShell(script);
1699
1716
  }
1700
1717
 
1718
+ /**
1719
+ * Get current foreground window handle (HWND)
1720
+ */
1721
+ async function getForegroundWindowHandle() {
1722
+ const script = `
1723
+ Add-Type -TypeDefinition @"
1724
+ using System;
1725
+ using System.Runtime.InteropServices;
1726
+ public class ForegroundHandle {
1727
+ [DllImport("user32.dll")]
1728
+ public static extern IntPtr GetForegroundWindow();
1729
+ public static long GetHandle() {
1730
+ return GetForegroundWindow().ToInt64();
1731
+ }
1732
+ }
1733
+ "@
1734
+ [ForegroundHandle]::GetHandle()
1735
+ `;
1736
+ const out = await executePowerShell(script);
1737
+ const num = Number(String(out).trim());
1738
+ return Number.isFinite(num) ? num : null;
1739
+ }
1740
+
1701
1741
  /**
1702
1742
  * Execute an action from AI
1703
1743
  * @param {Object} action - Action object from AI
@@ -2109,6 +2149,7 @@ module.exports = {
2109
2149
  drag,
2110
2150
  sleep,
2111
2151
  getActiveWindowTitle,
2152
+ getForegroundWindowHandle,
2112
2153
  resolveWindowHandle,
2113
2154
  minimizeWindow,
2114
2155
  restoreWindow,