copilot-liku-cli 0.0.10 → 0.0.11

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/INSTALLATION.md CHANGED
@@ -155,6 +155,29 @@ npm link
155
155
 
156
156
  This creates a symbolic link from your global `node_modules` to your local development directory. Any changes you make will be immediately available when you run `liku`.
157
157
 
158
+ ### 3b. Use the local repo version in another project (same machine)
159
+
160
+ If you want another project (e.g., `C:\dev\Whatup`) to use this local working copy instead of the npm-published version:
161
+
162
+ From the other project folder:
163
+
164
+ ```bash
165
+ npm link copilot-liku-cli
166
+ ```
167
+
168
+ Recommended verification (ensures you are using the local linked binary):
169
+
170
+ ```bash
171
+ npx --no-install liku doctor --json
172
+ ```
173
+
174
+ To switch the other project back to the published npm package:
175
+
176
+ ```bash
177
+ npm unlink copilot-liku-cli
178
+ npm install copilot-liku-cli
179
+ ```
180
+
158
181
  ### 4. Verify Setup
159
182
 
160
183
  ```bash
package/QUICKSTART.md CHANGED
@@ -42,6 +42,37 @@ liku start
42
42
  npm start
43
43
  ```
44
44
 
45
+ #### Option 3: Use the local repo version in another project (recommended for dev)
46
+
47
+ If you want a different project (e.g., `C:\dev\Whatup`) to use your *local working copy* of this repo (instead of the npm-published version), use `npm link`.
48
+
49
+ From the repo root:
50
+
51
+ ```bash
52
+ npm link
53
+ ```
54
+
55
+ From the other project:
56
+
57
+ ```bash
58
+ npm link copilot-liku-cli
59
+ ```
60
+
61
+ Verify you’re running the repo copy (recommended):
62
+
63
+ ```bash
64
+ npx --no-install liku doctor --json
65
+ ```
66
+
67
+ Look for `env.projectRoot` being the repo path (e.g., `C:\dev\copilot-Liku-cli`).
68
+
69
+ To switch back to the published npm version:
70
+
71
+ ```bash
72
+ npm unlink copilot-liku-cli
73
+ npm i copilot-liku-cli
74
+ ```
75
+
45
76
  ## Quick Verify (Recommended)
46
77
 
47
78
  After install, run these checks in order:
@@ -76,6 +107,25 @@ For deterministic, machine-readable output (recommended for smaller models / aut
76
107
  liku doctor --json
77
108
  ```
78
109
 
110
+ #### `doctor.v1` schema contract (for smaller models)
111
+
112
+ When you consume `liku doctor --json`, treat it as the source-of-truth for targeting and planning. The output is a single JSON object with:
113
+
114
+ - `schemaVersion` (string): currently `doctor.v1`.
115
+ - `ok` (boolean): `false` means at least one `checks[].status === "fail"`.
116
+ - `checks[]` (array): structured checks with `{ id, status: "pass"|"warn"|"fail", message, details? }`.
117
+ - `uiState` (object): UI Automation snapshot
118
+ - `uiState.activeWindow`: where input will go *right now*
119
+ - `uiState.windows[]`: discovered top-level windows (bounded unless `--all`)
120
+ - `targeting` (object | null): present when `doctor` is given a request text
121
+ - `targeting.selectedWindow`: the best-matched window candidate
122
+ - `targeting.candidates[]`: scored alternatives (for disambiguation)
123
+ - `plan` (object | null): present when a request is provided and a plan can be generated
124
+ - `plan.steps[]`: ordered steps, each with `{ state, goal, command, verification, notes? }`
125
+ - `next.commands[]` (array of strings): copy/paste-ready commands extracted from `plan.steps[].command`.
126
+
127
+ **Deterministic execution rule:** run `plan.steps[]` in order, and re-check `liku window --active` after any focus change before sending keys.
128
+
79
129
  `smoke:shortcuts` intentionally validates chat visibility via direct in-app
80
130
  toggle and validates keyboard routing on overlay with target gating.
81
131
 
@@ -163,6 +213,23 @@ When automating browsers, be explicit about **targeting**:
163
213
 
164
214
  If you skip steps 1–2 and the overlay/chat has focus, keyboard shortcuts may close the overlay instead of affecting the browser.
165
215
 
216
+ #### Robust recipe (recommended)
217
+
218
+ If your intent is to **continue in an existing Edge/Chrome window/tab**, prefer **in-window control** (focus + keyboard) over launching the browser again.
219
+
220
+ - Prefer: **focus window → new tab / address bar → type → enter → verify**
221
+ - Avoid for “existing tab control”: PowerShell COM \`SendKeys\`, \`Start-Process msedge ...\`, and \`microsoft-edge:...\` (these often open new windows/tabs and can be flaky).
222
+
223
+ **Canonical flow (what to ask the agent to do):**
224
+ 1) Bring the **target browser window** (Edge/Chrome/Firefox/Brave/etc) to the foreground
225
+ 2) \`ctrl+t\` (new tab) then \`ctrl+l\` (address bar)
226
+ 3) Type a full URL (prefer \`https://...\`) and press Enter
227
+ 4) Wait for load, then perform page-level action (e.g., YouTube search)
228
+ 5) Validate after major steps; if typing drops characters, re-focus the address bar and retry
229
+
230
+ **Self-heal typing retry (when URL is wrong):**
231
+ \`ctrl+l\` → \`ctrl+a\` → type URL again → \`enter\`
232
+
166
233
  ### Selecting a Screen Element
167
234
  ```
168
235
  1. Press Ctrl+Alt+Space to open chat
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "copilot-liku-cli",
3
- "version": "0.0.10",
3
+ "version": "0.0.11",
4
4
  "description": "GitHub Copilot CLI with headless agent + ultra-thin overlay architecture",
5
5
  "main": "src/main/index.js",
6
6
  "bin": {
@@ -79,6 +79,55 @@ function extractQuotedStrings(text) {
79
79
  return out;
80
80
  }
81
81
 
82
+ function escapeDoubleQuotes(text) {
83
+ return String(text || '').replace(/"/g, '\\"');
84
+ }
85
+
86
+ function extractUrlCandidate(text) {
87
+ const str = normalizeText(text);
88
+
89
+ // Full URL
90
+ const fullUrl = /(https?:\/\/[^\s"']+)/i.exec(str);
91
+ if (fullUrl?.[1]) return fullUrl[1];
92
+
93
+ // Common bare domains (keep conservative)
94
+ const bare = /\b([a-z0-9-]+\.)+(com|net|org|io|ai|dev|edu|gov)(\/[^\s"']*)?\b/i.exec(str);
95
+ if (bare?.[0]) return bare[0];
96
+
97
+ return null;
98
+ }
99
+
100
+ function extractSearchQuery(text) {
101
+ const str = normalizeText(text);
102
+ const quoted = extractQuotedStrings(str);
103
+
104
+ // Prefer quoted strings if user said search ... for "..."
105
+ const searchFor = /\bsearch\b/i.test(str) && /\bfor\b/i.test(str);
106
+ if (searchFor && quoted.length) return quoted[0];
107
+
108
+ // Unquoted: search (on/in)? (youtube/google)? for <rest>
109
+ const m = /\bsearch(?:\s+(?:on|in))?(?:\s+(?:youtube|google))?\s+for\s+([^\n\r.;]+)$/i.exec(str);
110
+ if (m?.[1]) return normalizeText(m[1]);
111
+
112
+ return null;
113
+ }
114
+
115
+ function toHttpsUrl(urlish) {
116
+ const u = normalizeText(urlish);
117
+ if (!u) return null;
118
+ if (/^https?:\/\//i.test(u)) return u;
119
+ return `https://${u}`;
120
+ }
121
+
122
+ function buildSearchUrl({ query, preferYouTube = false }) {
123
+ const q = normalizeText(query);
124
+ if (!q) return null;
125
+ if (preferYouTube) {
126
+ return `https://www.youtube.com/results?search_query=${encodeURIComponent(q)}`;
127
+ }
128
+ return `https://www.google.com/search?q=${encodeURIComponent(q)}`;
129
+ }
130
+
82
131
  function parseRequestHints(requestText) {
83
132
  const text = normalizeText(requestText);
84
133
  const lower = normalizeForMatch(text);
@@ -90,18 +139,42 @@ function parseRequestHints(requestText) {
90
139
  const inWindowMatch = /\b(?:in|within)\s+([^\n\r]+?)\s+window\b/i.exec(text);
91
140
  const windowHint = inWindowMatch ? normalizeText(inWindowMatch[1]) : null;
92
141
 
142
+ const wantsNewTab = /\bnew\s+tab\b/i.test(text) || /\bopen\s+a\s+new\s+tab\b/i.test(text);
143
+ const urlCandidate = extractUrlCandidate(text);
144
+ const searchQuery = extractSearchQuery(text);
145
+
146
+ const wantsIntegratedBrowser = /\b(integrated\s+browser|simple\s+browser|inside\s+vs\s*code|in\s+vs\s*code|vscode\s+insiders|workbench\.browser\.openlocalhostlinks|live\s+preview)\b/i.test(text);
147
+
148
+ const browserSignals = Boolean(urlCandidate)
149
+ || Boolean(searchQuery)
150
+ || /\b(go\s+to|navigate|visit|open\s+youtube|youtube\.com|search)\b/i.test(text);
151
+
93
152
  // Heuristic: infer app family
94
153
  const appHints = {
95
- isBrowser: /\b(edge|chrome|browser|msedge)\b/i.test(text),
154
+ isBrowser: /\b(edge|chrome|chromium|firefox|brave|opera|vivaldi|browser|msedge)\b/i.test(text) || browserSignals,
96
155
  isEditor: /\b(vs\s*code|visual\s*studio\s*code|code\s*-\s*insiders|editor)\b/i.test(text),
97
156
  isTerminal: /\b(terminal|powershell|cmd\.exe|command\s+prompt|windows\s+terminal)\b/i.test(text),
98
157
  isExplorer: /\b(file\s+explorer|explorer\.exe)\b/i.test(text),
99
158
  };
100
159
 
160
+ const requestedBrowser = (() => {
161
+ // Ordered from most-specific to least-specific
162
+ if (/\bedge\s+beta\b/i.test(text)) return { name: 'edge', keywords: ['edge', 'msedge', 'beta'] };
163
+ if (/\bmsedge\b/i.test(text) || /\bmicrosoft\s+edge\b/i.test(text) || /\bedge\b/i.test(text)) return { name: 'edge', keywords: ['edge', 'msedge'] };
164
+ if (/\bgoogle\s+chrome\b/i.test(text) || /\bchrome\b/i.test(text) || /\bchromium\b/i.test(text)) return { name: 'chrome', keywords: ['chrome', 'chromium'] };
165
+ if (/\bmozilla\s+firefox\b/i.test(text) || /\bfirefox\b/i.test(text)) return { name: 'firefox', keywords: ['firefox'] };
166
+ if (/\bbrave\b/i.test(text)) return { name: 'brave', keywords: ['brave'] };
167
+ if (/\bvivaldi\b/i.test(text)) return { name: 'vivaldi', keywords: ['vivaldi'] };
168
+ if (/\bopera\b/i.test(text)) return { name: 'opera', keywords: ['opera'] };
169
+ return null;
170
+ })();
171
+
101
172
  // Infer intent
102
173
  const intent = (() => {
103
174
  if (/\bclose\b/.test(lower) && /\btab\b/.test(lower)) return 'close_tab';
104
175
  if (/\bclose\b/.test(lower) && /\bwindow\b/.test(lower)) return 'close_window';
176
+ if (appHints.isBrowser && (urlCandidate || searchQuery)) return 'browser_navigate';
177
+ if (appHints.isBrowser && /\b(new\s+tab|open\s+tab|ctrl\+t|ctrl\+l|navigate|go\s+to|visit|open\s+youtube|youtube\.com|search\s+for|search)\b/i.test(text)) return 'browser_navigate';
105
178
  if (/\bclick\b/.test(lower)) return 'click';
106
179
  if (/\btype\b/.test(lower) || /\benter\b/.test(lower)) return 'type';
107
180
  if (/\bscroll\b/.test(lower)) return 'scroll';
@@ -123,9 +196,42 @@ function parseRequestHints(requestText) {
123
196
  tabTitle,
124
197
  appHints,
125
198
  elementTextCandidates,
199
+ wantsNewTab,
200
+ urlCandidate,
201
+ searchQuery,
202
+ requestedBrowser,
203
+ wantsIntegratedBrowser,
126
204
  };
127
205
  }
128
206
 
207
+ function isLikelyBrowserWindow(win) {
208
+ const title = win?.title || '';
209
+ const proc = win?.processName || '';
210
+ return (
211
+ includesCI(proc, 'msedge') || includesCI(title, 'edge') ||
212
+ includesCI(proc, 'chrome') || includesCI(title, 'chrome') ||
213
+ includesCI(proc, 'firefox') || includesCI(title, 'firefox') ||
214
+ includesCI(proc, 'brave') || includesCI(title, 'brave') ||
215
+ includesCI(proc, 'opera') || includesCI(title, 'opera') ||
216
+ includesCI(proc, 'vivaldi') || includesCI(title, 'vivaldi')
217
+ );
218
+ }
219
+
220
+ function isLikelyVSCodeWindow(win) {
221
+ const title = win?.title || '';
222
+ const proc = win?.processName || '';
223
+ return (
224
+ includesCI(proc, 'Code') || includesCI(proc, 'Code - Insiders') ||
225
+ includesCI(title, 'Visual Studio Code')
226
+ );
227
+ }
228
+
229
+ function isLocalhostUrl(urlish) {
230
+ const u = normalizeText(urlish);
231
+ if (!u) return false;
232
+ return /^(https?:\/\/)?(localhost|127\.0\.0\.1)(:\d+)?(\/|$)/i.test(u);
233
+ }
234
+
129
235
  function scoreWindowCandidate(win, hints) {
130
236
  let score = 0;
131
237
  const reasons = [];
@@ -138,10 +244,20 @@ function scoreWindowCandidate(win, hints) {
138
244
  reasons.push('title matches windowHint');
139
245
  }
140
246
 
141
- if (hints.appHints?.isBrowser && (includesCI(proc, 'msedge') || includesCI(title, 'edge') || includesCI(proc, 'chrome') || includesCI(title, 'chrome'))) {
247
+ const looksLikeBrowser = isLikelyBrowserWindow(win);
248
+
249
+ if (hints.appHints?.isBrowser && looksLikeBrowser) {
142
250
  score += 35;
143
251
  reasons.push('looks like browser');
144
252
  }
253
+
254
+ if (hints.requestedBrowser?.keywords?.length) {
255
+ const matchesPreferred = hints.requestedBrowser.keywords.some(k => includesCI(proc, k) || includesCI(title, k));
256
+ if (matchesPreferred) {
257
+ score += 25;
258
+ reasons.push(`matches requested browser (${hints.requestedBrowser.name})`);
259
+ }
260
+ }
145
261
  if (hints.appHints?.isEditor && (includesCI(title, 'visual studio code') || includesCI(title, 'code - insiders') || includesCI(proc, 'Code') || includesCI(proc, 'Code - Insiders'))) {
146
262
  score += 35;
147
263
  reasons.push('looks like editor');
@@ -164,8 +280,34 @@ function scoreWindowCandidate(win, hints) {
164
280
  }
165
281
 
166
282
  function buildSuggestedPlan(hints, activeWindow, rankedCandidates) {
167
- const top = rankedCandidates?.[0]?.window || null;
168
- const target = top || activeWindow || null;
283
+ const windowsRanked = Array.isArray(rankedCandidates) ? rankedCandidates.map(c => c.window).filter(Boolean) : [];
284
+ const browserWindowsRanked = windowsRanked.filter(isLikelyBrowserWindow);
285
+ const vsCodeWindowsRanked = windowsRanked.filter(isLikelyVSCodeWindow);
286
+
287
+ const target = (() => {
288
+ // If the user explicitly wants the VS Code integrated browser, target VS Code.
289
+ if (hints.wantsIntegratedBrowser) {
290
+ if (vsCodeWindowsRanked[0]) return vsCodeWindowsRanked[0];
291
+ if (activeWindow && isLikelyVSCodeWindow(activeWindow)) return activeWindow;
292
+ return windowsRanked[0] || activeWindow || null;
293
+ }
294
+
295
+ // For browser actions, never target an arbitrary non-browser window.
296
+ if (hints.intent === 'browser_navigate' && hints.appHints?.isBrowser) {
297
+ if (hints.requestedBrowser?.keywords?.length) {
298
+ const preferred = browserWindowsRanked.find(w => hints.requestedBrowser.keywords.some(k => includesCI(w?.processName || '', k) || includesCI(w?.title || '', k)));
299
+ if (preferred) return preferred;
300
+ }
301
+
302
+ // Fallback to any detected browser window, else the active window if it is a browser.
303
+ if (browserWindowsRanked[0]) return browserWindowsRanked[0];
304
+ if (activeWindow && isLikelyBrowserWindow(activeWindow)) return activeWindow;
305
+ return null;
306
+ }
307
+
308
+ // Non-browser intents: use ranking, then active window.
309
+ return windowsRanked[0] || activeWindow || null;
310
+ })();
169
311
  const plan = [];
170
312
 
171
313
  const targetTitleForFilter = target?.title ? String(target.title) : null;
@@ -221,6 +363,171 @@ function buildSuggestedPlan(hints, activeWindow, rankedCandidates) {
221
363
  return { target, plan };
222
364
  }
223
365
 
366
+ if (hints.intent === 'browser_navigate' && hints.appHints?.isBrowser) {
367
+ // If running inside VS Code and the user wants it, prefer using the Integrated Browser.
368
+ if (hints.wantsIntegratedBrowser) {
369
+ const url = toHttpsUrl(hints.urlCandidate) || buildSearchUrl({ query: hints.searchQuery, preferYouTube: false });
370
+ const localhostish = isLocalhostUrl(hints.urlCandidate);
371
+
372
+ plan.push({
373
+ state: 'OPEN_INTEGRATED_BROWSER',
374
+ goal: 'Open VS Code Integrated Browser',
375
+ command: 'liku keys ctrl+shift+p',
376
+ verification: 'Command Palette opens',
377
+ notes: 'Run the VS Code command: "Browser: Open Integrated Browser"',
378
+ });
379
+ plan.push({
380
+ state: 'COMMAND_INTEGRATED_BROWSER',
381
+ goal: 'Run the Integrated Browser command',
382
+ command: 'liku type "Browser: Open Integrated Browser"',
383
+ verification: 'The command appears in the palette',
384
+ });
385
+ plan.push({
386
+ state: 'CONFIRM_COMMAND',
387
+ goal: 'Execute the command',
388
+ command: 'liku keys enter',
389
+ verification: 'An Integrated Browser editor tab opens',
390
+ notes: localhostish
391
+ ? 'Tip: enable the VS Code setting workbench.browser.openLocalhostLinks to automatically open localhost links in the integrated browser.'
392
+ : 'Integrated Browser supports http(s) and file URLs.',
393
+ });
394
+
395
+ if (localhostish) {
396
+ plan.push({
397
+ state: 'OPEN_SETTINGS',
398
+ goal: 'Open VS Code Settings (optional)',
399
+ command: 'liku keys ctrl+,',
400
+ verification: 'Settings UI opens',
401
+ });
402
+ plan.push({
403
+ state: 'FIND_SETTING',
404
+ goal: 'Locate the localhost-integrated-browser setting',
405
+ command: 'liku type "workbench.browser.openLocalhostLinks"',
406
+ verification: 'The setting appears in search results',
407
+ notes: 'Enable it to route localhost links to the Integrated Browser.',
408
+ });
409
+ plan.push({
410
+ state: 'VERIFY_SETTING',
411
+ goal: 'Capture evidence of the setting state',
412
+ command: 'liku screenshot',
413
+ verification: 'Screenshot shows the setting and whether it is enabled',
414
+ });
415
+ }
416
+
417
+ if (url) {
418
+ plan.push({
419
+ state: 'FOCUS_ADDRESS_BAR',
420
+ goal: 'Focus the integrated browser address bar',
421
+ command: 'liku keys ctrl+l',
422
+ verification: 'Address bar is focused (URL text highlighted)',
423
+ });
424
+ plan.push({
425
+ state: 'TYPE_URL',
426
+ goal: 'Type the destination URL',
427
+ command: `liku type "${escapeDoubleQuotes(url)}"`,
428
+ verification: 'The full URL appears correctly in the address bar',
429
+ });
430
+ plan.push({
431
+ state: 'NAVIGATE',
432
+ goal: 'Navigate to the URL in the integrated browser',
433
+ command: 'liku keys enter',
434
+ verification: 'Page begins loading; content changes',
435
+ });
436
+ } else {
437
+ plan.push({
438
+ state: 'MISSING_URL',
439
+ goal: 'No URL could be inferred from the request',
440
+ command: 'liku screenshot',
441
+ verification: 'Use the screenshot to decide the next navigation step',
442
+ });
443
+ }
444
+
445
+ plan.push({
446
+ state: 'VERIFY_RESULT',
447
+ goal: 'Capture evidence of the resulting page state',
448
+ command: 'liku screenshot',
449
+ verification: 'Screenshot shows expected page state in the integrated browser',
450
+ });
451
+
452
+ return { target, plan };
453
+ }
454
+
455
+ if (!target) {
456
+ plan.push({
457
+ state: 'NO_BROWSER_WINDOW',
458
+ goal: 'No browser window was detected; open a browser window first',
459
+ command: 'liku window',
460
+ verification: 'A browser window (Edge/Chrome/Firefox/Brave/etc) appears in the list',
461
+ });
462
+ return { target: null, plan };
463
+ }
464
+
465
+ // Prefer deterministic in-window navigation over process launch.
466
+ const preferYouTube = /\byoutube\b/i.test(hints.raw || '') || /youtube\.com/i.test(hints.raw || '');
467
+ const url = (
468
+ toHttpsUrl(hints.urlCandidate) ||
469
+ buildSearchUrl({ query: hints.searchQuery, preferYouTube })
470
+ );
471
+
472
+ if (hints.wantsNewTab) {
473
+ plan.push({
474
+ state: 'OPEN_NEW_TAB',
475
+ goal: 'Open a new tab in the focused browser window',
476
+ command: 'liku keys ctrl+t',
477
+ verification: 'A new tab opens (tab count increases or blank tab appears)',
478
+ });
479
+ }
480
+
481
+ plan.push({
482
+ state: 'FOCUS_ADDRESS_BAR',
483
+ goal: 'Focus the address bar',
484
+ command: 'liku keys ctrl+l',
485
+ verification: 'Address bar is focused (URL text highlighted)',
486
+ notes: 'If focus is flaky, re-run `liku window --active` and re-focus the browser window before sending keys.',
487
+ });
488
+
489
+ if (url) {
490
+ plan.push({
491
+ state: 'TYPE_URL',
492
+ goal: `Type the destination URL${hints.searchQuery ? ' (search encoded into URL for reliability)' : ''}`,
493
+ command: `liku type "${escapeDoubleQuotes(url)}"`,
494
+ verification: 'The full URL appears correctly in the address bar',
495
+ notes: 'If characters drop: ctrl+l → ctrl+a → type URL again → enter (with short pauses).',
496
+ });
497
+ plan.push({
498
+ state: 'NAVIGATE',
499
+ goal: 'Navigate to the URL in the current tab',
500
+ command: 'liku keys enter',
501
+ verification: 'Page begins loading; title/content changes',
502
+ });
503
+ } else {
504
+ plan.push({
505
+ state: 'MISSING_URL',
506
+ goal: 'No URL could be inferred from the request',
507
+ command: 'liku screenshot',
508
+ verification: 'Use the screenshot to decide the next navigation step',
509
+ });
510
+ }
511
+
512
+ plan.push({
513
+ state: 'VERIFY_FOCUS',
514
+ goal: 'Verify keyboard focus stayed on the browser window',
515
+ command: 'liku window --active',
516
+ verification: hints.requestedBrowser?.name
517
+ ? `Active window process/title matches the requested browser (${hints.requestedBrowser.name})`
518
+ : 'Active window process/title matches a browser window',
519
+ });
520
+
521
+ plan.push({
522
+ state: 'VERIFY_RESULT',
523
+ goal: 'Capture evidence of the resulting page state',
524
+ command: 'liku screenshot',
525
+ verification: 'Screenshot shows expected page (e.g., YouTube results for query)',
526
+ });
527
+
528
+ return { target, plan };
529
+ }
530
+
224
531
  if (hints.intent === 'close_window') {
225
532
  plan.push({
226
533
  state: 'EXECUTE_ACTION',
@@ -496,6 +496,32 @@ function getPlatformContext() {
496
496
  - **Reopen closed tab**: \`ctrl+shift+t\`
497
497
  - **Close window**: \`ctrl+shift+w\`
498
498
 
499
+ ### Browser Automation Policy (Robust)
500
+ When the user asks to **use an existing browser window/tab** (Edge/Chrome), prefer **in-window control** (focus + keys) instead of launching processes.
501
+
502
+ - **DO NOT** use PowerShell COM \`SendKeys\` or \`Start-Process msedge\` / \`microsoft-edge:\` to control an existing tab. These are unreliable and may open new windows/tabs unexpectedly.
503
+ - **DO** use Liku actions: \`bring_window_to_front\` / \`focus_window\` + \`key\` + \`type\` + \`wait\`.
504
+ - **Chain the whole flow in one action block** so focus is maintained; avoid pausing for manual validation.
505
+
506
+ **Reliable recipes:**
507
+ - **Open a new tab in the existing Edge/Chrome window**:
508
+ 1) bring window to front
509
+ 2) wait 300–800ms
510
+ 3) \`ctrl+t\`
511
+ 4) wait 200–500ms
512
+ - **Navigate the current tab to a URL**:
513
+ 1) \`ctrl+l\` (address bar)
514
+ 2) wait 150–300ms
515
+ 3) type full URL (prefer \`https://...\`)
516
+ 4) \`enter\`
517
+ 5) wait 2000–5000ms (page load)
518
+ - **Self-heal if text drops/mis-types**: \`ctrl+l\` → \`ctrl+a\` → type again → \`enter\` (add waits)
519
+ - **YouTube search (keyboard-first)**: press \`/\` to focus search → type query → \`enter\` → wait
520
+
521
+ **Verification guidance:**
522
+ - If unsure whether the right window/tab is active, take a quick \`screenshot\` and proceed only when the browser is clearly focused.
523
+ - Validate major state changes (after focus, after navigation, after submitting search). If validation fails, retry focus + navigation (bounded retries).
524
+
499
525
  ### Focus Rule (CRITICAL)
500
526
  Before sending keyboard shortcuts, make sure the intended app window is focused.
501
527
  If the overlay/chat has focus, shortcuts like \`ctrl+w\` / \`ctrl+shift+w\` may close the overlay instead of the target app.
@@ -1920,6 +1946,27 @@ function analyzeActionSafety(action, targetInfo = {}) {
1920
1946
  case 'key':
1921
1947
  // Analyze key combinations
1922
1948
  const key = (action.key || '').toLowerCase();
1949
+ const keyNorm = key.replace(/\s+/g, '');
1950
+
1951
+ // Treat window/tab/app-close shortcuts as HIGH risk: they can instantly close the overlay,
1952
+ // the active terminal tab/window, a browser window, or dismiss important dialogs.
1953
+ // Require explicit confirmation so smaller models can't accidentally "self-close" the UI.
1954
+ const closeCombos = [
1955
+ 'alt+f4',
1956
+ 'ctrl+w',
1957
+ 'ctrl+shift+w',
1958
+ 'ctrl+q',
1959
+ 'ctrl+shift+q',
1960
+ 'cmd+w',
1961
+ 'cmd+q',
1962
+ ];
1963
+ if (closeCombos.includes(keyNorm)) {
1964
+ result.riskLevel = ActionRiskLevel.CRITICAL;
1965
+ result.warnings.push(`Close shortcut detected: ${action.key}`);
1966
+ result.requiresConfirmation = true;
1967
+ break;
1968
+ }
1969
+
1923
1970
  if (key.includes('delete') || key.includes('backspace')) {
1924
1971
  result.riskLevel = ActionRiskLevel.HIGH;
1925
1972
  result.warnings.push('Delete/Backspace key may remove content');
@@ -2154,9 +2201,21 @@ async function executeActions(actionData, onAction = null, onScreenshot = null,
2154
2201
  const results = [];
2155
2202
  let screenshotRequested = false;
2156
2203
  let pendingConfirmation = false;
2204
+ let lastTargetWindowHandle = null;
2157
2205
 
2158
2206
  for (let i = 0; i < actionData.actions.length; i++) {
2159
2207
  const action = actionData.actions[i];
2208
+
2209
+ // Track the intended target window across steps so later key/type actions can
2210
+ // re-focus it. Without this, focus can drift back to the overlay/terminal.
2211
+ if (action.type === 'focus_window' || action.type === 'bring_window_to_front') {
2212
+ try {
2213
+ const hwnd = await systemAutomation.resolveWindowHandle(action);
2214
+ if (hwnd) {
2215
+ lastTargetWindowHandle = hwnd;
2216
+ }
2217
+ } catch {}
2218
+ }
2160
2219
 
2161
2220
  // Handle screenshot requests specially
2162
2221
  if (action.type === 'screenshot') {
@@ -2179,9 +2238,14 @@ async function executeActions(actionData, onAction = null, onScreenshot = null,
2179
2238
  // Analyze safety
2180
2239
  const safety = analyzeActionSafety(action, targetInfo);
2181
2240
  console.log(`[AI-SERVICE] Action ${i} safety: ${safety.riskLevel}`, safety.warnings);
2241
+
2242
+ // CRITICAL actions require an explicit confirmation step, even if the user clicked
2243
+ // the general "Execute" button for a batch. This prevents accidental destructive
2244
+ // shortcuts (e.g., alt+f4) from immediately closing the active app due to focus issues.
2245
+ const canBypassConfirmation = skipSafetyConfirmation && safety.riskLevel !== ActionRiskLevel.CRITICAL;
2182
2246
 
2183
2247
  // If HIGH or CRITICAL risk, require confirmation (unless user already confirmed via Execute button)
2184
- if (safety.requiresConfirmation && !skipSafetyConfirmation) {
2248
+ if (safety.requiresConfirmation && !canBypassConfirmation) {
2185
2249
  console.log(`[AI-SERVICE] Action ${i} requires user confirmation`);
2186
2250
 
2187
2251
  // Store as pending action
@@ -2204,7 +2268,11 @@ async function executeActions(actionData, onAction = null, onScreenshot = null,
2204
2268
  }
2205
2269
 
2206
2270
  if (skipSafetyConfirmation && safety.requiresConfirmation) {
2207
- console.log(`[AI-SERVICE] Action ${i} safety bypassed (user pre-confirmed via Execute button)`);
2271
+ if (canBypassConfirmation) {
2272
+ console.log(`[AI-SERVICE] Action ${i} safety bypassed (user pre-confirmed via Execute button)`);
2273
+ } else {
2274
+ console.log(`[AI-SERVICE] Action ${i} requires explicit confirmation (CRITICAL)`);
2275
+ }
2208
2276
  }
2209
2277
 
2210
2278
  // Execute the action (SAFE/LOW/MEDIUM risk)
@@ -2214,6 +2282,7 @@ async function executeActions(actionData, onAction = null, onScreenshot = null,
2214
2282
  if (uiWatcher && uiWatcher.isPolling) {
2215
2283
  const elementAtPoint = uiWatcher.getElementAtPoint(action.x, action.y);
2216
2284
  if (elementAtPoint && elementAtPoint.windowHandle) {
2285
+ lastTargetWindowHandle = elementAtPoint.windowHandle;
2217
2286
  // Found an element with a known window handle
2218
2287
  // Focus it first to ensure click goes to the right window (not trapped by overlay or obscuring window)
2219
2288
  // We can call systemAutomation.focusWindow directly
@@ -2224,11 +2293,35 @@ async function executeActions(actionData, onAction = null, onScreenshot = null,
2224
2293
  }
2225
2294
  }
2226
2295
 
2296
+ // Ensure keyboard input goes to the last known target window.
2297
+ if ((action.type === 'key' || action.type === 'type') && lastTargetWindowHandle) {
2298
+ console.log(`[AI-SERVICE] Re-focusing last target window ${lastTargetWindowHandle} before ${action.type}`);
2299
+ await systemAutomation.focusWindow(lastTargetWindowHandle);
2300
+ await new Promise(r => setTimeout(r, 125));
2301
+ }
2302
+
2227
2303
  const result = await (actionExecutor ? actionExecutor(action) : systemAutomation.executeAction(action));
2228
2304
  result.reason = action.reason || '';
2229
2305
  result.safety = safety;
2230
2306
  results.push(result);
2231
2307
 
2308
+ // If we just performed a step that likely changed focus, snapshot the actual foreground HWND.
2309
+ // This is especially important when uiWatcher isn't polling (can't infer windowHandle).
2310
+ if (typeof systemAutomation.getForegroundWindowHandle === 'function') {
2311
+ if (
2312
+ action.type === 'click' ||
2313
+ action.type === 'double_click' ||
2314
+ action.type === 'right_click' ||
2315
+ action.type === 'focus_window' ||
2316
+ action.type === 'bring_window_to_front'
2317
+ ) {
2318
+ const fg = await systemAutomation.getForegroundWindowHandle();
2319
+ if (fg) {
2320
+ lastTargetWindowHandle = fg;
2321
+ }
2322
+ }
2323
+ }
2324
+
2232
2325
  // Callback for UI updates
2233
2326
  if (onAction) {
2234
2327
  onAction(result, i, actionData.actions.length);
@@ -2270,10 +2363,20 @@ async function resumeAfterConfirmation(onAction = null, onScreenshot = null, opt
2270
2363
 
2271
2364
  const results = [...pending.completedResults];
2272
2365
  let screenshotRequested = false;
2366
+ let lastTargetWindowHandle = null;
2273
2367
 
2274
2368
  // Execute the confirmed action and remaining actions
2275
2369
  for (let i = 0; i < pending.remainingActions.length; i++) {
2276
2370
  const action = pending.remainingActions[i];
2371
+
2372
+ if (action.type === 'focus_window' || action.type === 'bring_window_to_front') {
2373
+ try {
2374
+ const hwnd = await systemAutomation.resolveWindowHandle(action);
2375
+ if (hwnd) {
2376
+ lastTargetWindowHandle = hwnd;
2377
+ }
2378
+ } catch {}
2379
+ }
2277
2380
 
2278
2381
  if (action.type === 'screenshot') {
2279
2382
  screenshotRequested = true;
@@ -2283,12 +2386,45 @@ async function resumeAfterConfirmation(onAction = null, onScreenshot = null, opt
2283
2386
  results.push({ success: true, action: 'screenshot', message: 'Screenshot captured' });
2284
2387
  continue;
2285
2388
  }
2389
+
2390
+ if ((action.type === 'click' || action.type === 'double_click' || action.type === 'right_click') && action.x !== undefined) {
2391
+ if (uiWatcher && uiWatcher.isPolling) {
2392
+ const elementAtPoint = uiWatcher.getElementAtPoint(action.x, action.y);
2393
+ if (elementAtPoint && elementAtPoint.windowHandle) {
2394
+ lastTargetWindowHandle = elementAtPoint.windowHandle;
2395
+ console.log(`[AI-SERVICE] (resume) Auto-focusing window handle ${elementAtPoint.windowHandle} for click at (${action.x}, ${action.y})`);
2396
+ await systemAutomation.focusWindow(elementAtPoint.windowHandle);
2397
+ await new Promise(r => setTimeout(r, 450));
2398
+ }
2399
+ }
2400
+ }
2401
+
2402
+ if ((action.type === 'key' || action.type === 'type') && lastTargetWindowHandle) {
2403
+ console.log(`[AI-SERVICE] (resume) Re-focusing last target window ${lastTargetWindowHandle} before ${action.type}`);
2404
+ await systemAutomation.focusWindow(lastTargetWindowHandle);
2405
+ await new Promise(r => setTimeout(r, 125));
2406
+ }
2286
2407
 
2287
2408
  // Execute action (user confirmed, skip safety for first action)
2288
2409
  const result = await (actionExecutor ? actionExecutor(action) : systemAutomation.executeAction(action));
2289
2410
  result.reason = action.reason || '';
2290
2411
  result.userConfirmed = i === 0; // First one was confirmed
2291
2412
  results.push(result);
2413
+
2414
+ if (typeof systemAutomation.getForegroundWindowHandle === 'function') {
2415
+ if (
2416
+ action.type === 'click' ||
2417
+ action.type === 'double_click' ||
2418
+ action.type === 'right_click' ||
2419
+ action.type === 'focus_window' ||
2420
+ action.type === 'bring_window_to_front'
2421
+ ) {
2422
+ const fg = await systemAutomation.getForegroundWindowHandle();
2423
+ if (fg) {
2424
+ lastTargetWindowHandle = fg;
2425
+ }
2426
+ }
2427
+ }
2292
2428
 
2293
2429
  if (onAction) {
2294
2430
  onAction(result, pending.actionIndex + i, pending.actionIndex + pending.remainingActions.length);
package/src/main/index.js CHANGED
@@ -1,3 +1,65 @@
1
+ function isBrokenPipeLikeError(err) {
2
+ const code = err && err.code;
3
+ return (
4
+ code === 'EPIPE' ||
5
+ code === 'ERR_STREAM_DESTROYED' ||
6
+ code === 'ERR_STREAM_WRITE_AFTER_END'
7
+ );
8
+ }
9
+
10
+ function patchConsoleForBrokenPipes() {
11
+ const methods = ['log', 'info', 'warn', 'error'];
12
+ const originals = {};
13
+ let stdioDisabled = false;
14
+
15
+ for (const method of methods) {
16
+ originals[method] = typeof console[method] === 'function'
17
+ ? console[method].bind(console)
18
+ : () => {};
19
+
20
+ console[method] = (...args) => {
21
+ if (stdioDisabled) return;
22
+ try {
23
+ originals[method](...args);
24
+ } catch (e) {
25
+ if (isBrokenPipeLikeError(e)) {
26
+ stdioDisabled = true;
27
+ return;
28
+ }
29
+ throw e;
30
+ }
31
+ };
32
+ }
33
+
34
+ const swallowStreamError = (stream) => {
35
+ if (!stream || typeof stream.on !== 'function') return;
36
+ stream.on('error', (e) => {
37
+ if (isBrokenPipeLikeError(e)) {
38
+ stdioDisabled = true;
39
+ return;
40
+ }
41
+ });
42
+ };
43
+
44
+ swallowStreamError(process.stdout);
45
+ swallowStreamError(process.stderr);
46
+ }
47
+
48
+ patchConsoleForBrokenPipes();
49
+
50
+ process.on('uncaughtException', (err) => {
51
+ if (isBrokenPipeLikeError(err)) {
52
+ return;
53
+ }
54
+ throw err;
55
+ });
56
+
57
+ process.on('unhandledRejection', (reason) => {
58
+ if (isBrokenPipeLikeError(reason)) {
59
+ return;
60
+ }
61
+ });
62
+
1
63
  // Ensure Electron runs in app mode even if a dev shell has ELECTRON_RUN_AS_NODE set
2
64
  if (process.env.ELECTRON_RUN_AS_NODE) {
3
65
  console.warn('ELECTRON_RUN_AS_NODE was set; clearing so the app can start normally.');
@@ -488,7 +488,24 @@ public class WindowFocus {
488
488
  [WindowFocus]::Focus([IntPtr]::new(${hwnd}))
489
489
  `;
490
490
  await executePowerShell(script);
491
- console.log(`[AUTOMATION] Focused window handle: ${hwnd}`);
491
+
492
+ // Poll to verify focus actually stuck (SetForegroundWindow can be racy / blocked)
493
+ let verified = false;
494
+ for (let attempt = 0; attempt < 10; attempt++) {
495
+ const fg = await getForegroundWindowHandle();
496
+ if (fg === hwnd) {
497
+ verified = true;
498
+ break;
499
+ }
500
+ await sleep(50);
501
+ }
502
+
503
+ if (verified) {
504
+ console.log(`[AUTOMATION] Focused window handle (verified): ${hwnd}`);
505
+ } else {
506
+ const fg = await getForegroundWindowHandle();
507
+ console.warn(`[AUTOMATION] Focus requested for ${hwnd} but foreground is ${fg}`);
508
+ }
492
509
  }
493
510
 
494
511
  /**
@@ -1698,6 +1715,29 @@ public class WindowInfo {
1698
1715
  return await executePowerShell(script);
1699
1716
  }
1700
1717
 
1718
+ /**
1719
+ * Get current foreground window handle (HWND)
1720
+ */
1721
+ async function getForegroundWindowHandle() {
1722
+ const script = `
1723
+ Add-Type -TypeDefinition @"
1724
+ using System;
1725
+ using System.Runtime.InteropServices;
1726
+ public class ForegroundHandle {
1727
+ [DllImport("user32.dll")]
1728
+ public static extern IntPtr GetForegroundWindow();
1729
+ public static long GetHandle() {
1730
+ return GetForegroundWindow().ToInt64();
1731
+ }
1732
+ }
1733
+ "@
1734
+ [ForegroundHandle]::GetHandle()
1735
+ `;
1736
+ const out = await executePowerShell(script);
1737
+ const num = Number(String(out).trim());
1738
+ return Number.isFinite(num) ? num : null;
1739
+ }
1740
+
1701
1741
  /**
1702
1742
  * Execute an action from AI
1703
1743
  * @param {Object} action - Action object from AI
@@ -2109,6 +2149,7 @@ module.exports = {
2109
2149
  drag,
2110
2150
  sleep,
2111
2151
  getActiveWindowTitle,
2152
+ getForegroundWindowHandle,
2112
2153
  resolveWindowHandle,
2113
2154
  minimizeWindow,
2114
2155
  restoreWindow,