omnikey-cli 1.0.30 → 1.0.32

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -14,12 +14,39 @@ ${hasTaskInstructions
14
14
  - Priority order for conflicts: system rules > stored instructions > user input.
15
15
 
16
16
  **When to use shell scripts:**
17
- - Default to a \`<shell_script>\` for anything involving the machine, network, files, processes, env vars, or system state — never answer these from training data alone.
18
- - Scripts must be safe and read-only (inspection/diagnostics only). No installs, no data modification, no system changes, no sudo/admin privileges.
19
- - Use ${!isWindows ? 'bash (macOS/Linux)' : 'PowerShell'}. Scripts must be self-contained and ready to run as-is.
20
- - One comprehensive script per turn; wait for output only if you genuinely need it to proceed.
17
+ - Default to a \`<shell_script>\` for anything involving the machine, network, files, processes, env vars, or system state — never answer from training data alone.
18
+ - **Read vs write:** For open-ended/ambiguous requests run safe read-only commands first to understand the current state. When the user **explicitly** asks to create, update, delete, configure, or run something — do it directly; no need to ask for confirmation unless the scope is genuinely unclear.
19
+ - **Package installation:** Install any package required to complete the task. Include the install step as its own phase so you can confirm it succeeded before building on it. Prefer project-local or user scope; avoid \`sudo\`/admin unless the user explicitly asks.
20
+ ${config_1.config.browserDebugPort !== undefined ? `- **Browser automation:** When the user explicitly asks to interact with a browser (click a button, fill a form, check a page, take a screenshot, etc.), generate \`<shell_script>\` blocks that use Node.js and \`playwright-core\` — one phase at a time (phasing rules below apply).
21
+ - **Phase 1 — ensure deps:** Check and install \`playwright-core\` if missing:
22
+ \`node -e "require('/tmp/playwright-runner/node_modules/playwright-core')" 2>/dev/null || npm install --prefix /tmp/playwright-runner playwright-core --silent\`
23
+ - **Phase 2 — connect & navigate:** Try CDP first; fall back to the existing debug profile. Reuse an open tab if the URL already matches — never open a duplicate.
24
+ \`\`\`js
25
+ const { chromium } = require('/tmp/playwright-runner/node_modules/playwright-core');
26
+ let browser, page;
27
+ try {
28
+ browser = await chromium.connectOverCDP('http://localhost:${config_1.config.browserDebugPort}');
29
+ const pages = browser.contexts().flatMap(c => c.pages());
30
+ page = pages.find(p => p.url().startsWith(TARGET_URL)) ?? null;
31
+ if (page) { await page.bringToFront(); }
32
+ else { page = await browser.contexts()[0].newPage(); await page.goto(TARGET_URL, { waitUntil: 'domcontentloaded', timeout: 15000 }); }
33
+ } catch {
34
+ const ctx = await chromium.launchPersistentContext('${config_1.config.browserDebugUserDataDir}', { executablePath: '${config_1.config.browserDebugExecutable}', headless: false });
35
+ browser = ctx;
36
+ page = ctx.pages().find(p => p.url().startsWith(TARGET_URL)) ?? await ctx.newPage();
37
+ if (!page.url().startsWith(TARGET_URL)) await page.goto(TARGET_URL, { waitUntil: 'domcontentloaded', timeout: 15000 });
38
+ }
39
+ \`\`\`
40
+ - **Phase 3+ — one action per script:** Each subsequent script reconnects the same way, finds the already-open tab, performs exactly one action (click / type / select / screenshot / read text), prints the result, then calls \`browser.disconnect()\` (CDP) or just exits (profile launch — leaves the window open).
41
+ - Always inline Node.js via a bash heredoc so the script is self-contained. Print structured output to stdout so it returns as \`TERMINAL OUTPUT:\`.` : ''}
42
+ - Use ${!isWindows ? 'bash (macOS/Linux)' : 'PowerShell'}. Every script must be self-contained and ready to run as-is.
21
43
  - Skip the script only for purely factual/conversational requests with no live data dependency (e.g. "what is 2+2").
22
44
 
45
+ **Script phasing — one phase per turn:**
46
+ - Break every multi-step task into the smallest logical unit that can independently succeed or fail. Emit that script, wait for \`TERMINAL OUTPUT:\`, assess the result, then write the next script. Never combine phases that have independent failure modes into a single block — a mid-script failure loses all context for recovery.
47
+ - Natural phase boundaries: **(1)** check / install dependencies → **(2)** inspect / probe current state → **(3)** make one targeted change → **(4)** verify the change took effect. Add a boundary wherever a failure would require a different next step than a success.
48
+ - Single-step read-only queries ("list files", "show env") need no splitting — one script is fine.
49
+
23
50
  **When to use web tools:**
24
51
  - Use the built-in \`web_fetch\` tool when the user provides a URL that must be retrieved.
25
52
  - Use the built-in \`web_search\` tool when the user asks to search online, or when current information (prices, docs, recent events) is needed.
@@ -31,8 +58,11 @@ ${hasTaskInstructions
31
58
  - After the tool call returns, provide a \`<final_answer>\` that includes the saved file path.
32
59
 
33
60
  **Incoming message tags:**
34
- - \`TERMINAL OUTPUT:\` — stdout/stderr from a prior script. Analyze it immediately and respond with EITHER a follow-up \`<shell_script>\` (if more data is needed) OR a \`<final_answer>\` (if you have enough to conclude). You MUST pick one never respond with plain text.
35
- - \`COMMAND ERROR:\` script failed. Diagnose and emit a corrected \`<shell_script>\` or explain in \`<final_answer>\`.
61
+ - \`TERMINAL OUTPUT:\` — output from the last script. You MUST assess it before proceeding:
62
+ - Phase succeeded emit the **next phase** as a new \`<shell_script>\`, or \`<final_answer>\` if the task is complete.
63
+ - Phase failed or produced unexpected output → emit a targeted corrective \`<shell_script>\` that fixes only what failed. Do not restart from scratch unless the failure is fundamental.
64
+ Never skip assessment — never assume the previous phase succeeded without reading its output.
65
+ - \`COMMAND ERROR:\` — script exited non-zero. Diagnose the specific line that failed, then emit a corrected \`<shell_script>\` scoped to that failure.
36
66
  - No prefix — direct user message; treat as the primary request.
37
67
 
38
68
  **Response format — every response must be exactly one of:**
@@ -84,7 +84,7 @@ async function runToolLoop(initialResult, session, sessionId, send, log, tools,
84
84
  content: `Generating image: "${prompt.slice(0, 100)}${prompt.length > 100 ? '...' : ''}"`,
85
85
  is_terminal_output: false,
86
86
  is_error: false,
87
- is_web_call: false,
87
+ is_web_call: true,
88
88
  });
89
89
  const toolResult = await (0, imageTool_1.executeImageGenerationTool)(args, log);
90
90
  log.info('Tool call completed', {
@@ -164,7 +164,7 @@ const aiModel = (0, ai_client_1.getDefaultModel)(config_1.config.aiProvider, 'sm
164
164
  // In-memory cache: sessionId -> live SessionState. Hydrated from DB on first
165
165
  // access and written back after each turn so restarts resume correctly.
166
166
  const sessionMessages = new Map();
167
- const MAX_TURNS = 10;
167
+ const MAX_TURNS = 20;
168
168
  // ─── DB helpers ───────────────────────────────────────────────────────────────
169
169
  async function persistSessionToDB(sessionId, state) {
170
170
  try {
@@ -101,6 +101,9 @@ exports.config = {
101
101
  const n = parseInt(raw, 10);
102
102
  return Number.isNaN(n) ? undefined : n;
103
103
  })(),
104
+ browserDebugBrowserName: getEnv('BROWSER_DEBUG_BROWSER_NAME', false),
105
+ browserDebugExecutable: getEnv('BROWSER_DEBUG_EXECUTABLE', false),
106
+ browserDebugUserDataDir: getEnv('BROWSER_DEBUG_USER_DATA_DIR', false),
104
107
  // GCS download-count tracking (both must be set to enable counting)
105
108
  gcsBucketName: getEnv('GCS_BUCKET_NAME', false),
106
109
  gcsDownloadCountObject: getEnv('GCS_DOWNLOAD_COUNT_OBJECT', false),
@@ -25,7 +25,7 @@ const FINAL_ANSWER_RE = /<final_answer>/;
25
25
  const JOB_TIMEOUT_MS = 10 * 60 * 1000;
26
26
  // Cron jobs get more turns than interactive sessions so multi-step tasks
27
27
  // (web research → shell commands → final answer) can complete unattended.
28
- const MAX_CRON_TURNS = 20;
28
+ const MAX_CRON_TURNS = 30;
29
29
  function computeNextRunAt(cronExpression, runAt) {
30
30
  if (cronExpression) {
31
31
  try {
@@ -158,7 +158,7 @@ function runCronJob(job, subscription, sessionId) {
158
158
  sender: 'user',
159
159
  content: job.prompt,
160
160
  platform: job.platform ?? undefined,
161
- }, send, logger_1.logger, { maxTurns: MAX_CRON_TURNS }).catch((err) => settle(err instanceof Error ? err : new Error(String(err))));
161
+ }, send, logger_1.logger, { maxTurns: MAX_CRON_TURNS, isCronJob: true }).catch((err) => settle(err instanceof Error ? err : new Error(String(err))));
162
162
  });
163
163
  }
164
164
  async function executeJob(job) {
@@ -212,15 +212,74 @@ function getRunningBrowserNames() {
212
212
  }
213
213
  return running;
214
214
  }
215
- // ─── Strategy -1: CDP via DevToolsActivePort ─────────────────────────────────
216
- //
217
- // When Chrome is launched with --remote-debugging-port (or --remote-debugging-port=0
218
- // to let it pick a free port), it writes a DevToolsActivePort file to the user data
219
- // directory containing the actual port. Connecting via CDP gives us direct access
220
- // to the live, JS-rendered tab content without AppleScript permissions or cookie
221
- // decryption. This is the fastest and most reliable path when available.
222
- async function fetchWithCDP(url, browsersWithUrl, log) {
215
+ /**
216
+ * ─── Strategy -1: CDP via DevToolsActivePort ─────────────────────────────────
217
+ * When Chrome is launched with --remote-debugging-port (or --remote-debugging-port=0
218
+ * to let it pick a free port), it writes a DevToolsActivePort file to the user data
219
+ * directory containing the actual port. Connecting via CDP gives us direct access
220
+ * to the live, JS-rendered tab content without AppleScript permissions or cookie
221
+ * decryption. This is the fastest and most reliable path when available.
222
+ */
223
+ async function fetchWithCDP(url, workingPorts, log) {
223
224
  const targetBase = url.split('?')[0]; // strip query for prefix match
225
+ for (const port of workingPorts) {
226
+ log.info('browser-playwright: CDP — debug endpoint found, connecting', { port });
227
+ let cdpBrowser = null;
228
+ try {
229
+ cdpBrowser = await playwright_core_1.default.chromium.connectOverCDP(`http://localhost:${port}`, {
230
+ timeout: 5000,
231
+ });
232
+ let matchedPage = null;
233
+ for (const context of cdpBrowser.contexts()) {
234
+ for (const page of context.pages()) {
235
+ if (page.url().startsWith(targetBase)) {
236
+ matchedPage = page;
237
+ break;
238
+ }
239
+ }
240
+ if (matchedPage)
241
+ break;
242
+ }
243
+ if (!matchedPage) {
244
+ log.debug('browser-playwright: CDP — no tab found matching URL', { port, url });
245
+ continue;
246
+ }
247
+ log.info('browser-playwright: CDP — tab found, extracting content', {
248
+ port,
249
+ tabUrl: matchedPage.url(),
250
+ });
251
+ try {
252
+ await matchedPage.waitForFunction(() => (document.body?.innerText ?? '').trim().length > 200, { timeout: 5000 });
253
+ }
254
+ catch {
255
+ // Best-effort — extract whatever is rendered so far
256
+ }
257
+ const content = await matchedPage.evaluate(() => document.body.innerText ?? document.body.textContent ?? '');
258
+ log.info('browser-playwright: CDP — content extracted', {
259
+ port,
260
+ contentLength: content.trim().length,
261
+ });
262
+ const trimmed = content.trim();
263
+ return trimmed ? { content: trimmed, finalUrl: matchedPage.url() } : null;
264
+ }
265
+ catch (err) {
266
+ log.warn('browser-playwright: CDP — connection failed', {
267
+ port,
268
+ error: err instanceof Error ? err.message.split('\n')[0] : String(err),
269
+ });
270
+ }
271
+ finally {
272
+ if (cdpBrowser) {
273
+ try {
274
+ await cdpBrowser.close();
275
+ }
276
+ catch { }
277
+ }
278
+ }
279
+ }
280
+ return null;
281
+ }
282
+ async function getWorkingCdpPorts(browsersWithUrl, log) {
224
283
  // Collect candidate ports:
225
284
  // 1. DevToolsActivePort file (written when Chrome was started with --remote-debugging-port)
226
285
  // 2. Well-known default ports developers commonly use
@@ -294,76 +353,26 @@ async function fetchWithCDP(url, browsersWithUrl, log) {
294
353
  candidatePorts.splice(candidatePorts.indexOf(config_1.config.browserDebugPort), 1);
295
354
  candidatePorts.unshift(config_1.config.browserDebugPort);
296
355
  }
356
+ const workingPorts = [];
297
357
  for (const port of candidatePorts) {
298
- // Quick HTTP probe: /json/version returns immediately if the debug endpoint is up.
299
- let endpointUp = false;
300
358
  try {
301
359
  // Use 127.0.0.1 explicitly — on Windows, `localhost` may resolve to ::1
302
360
  // while Chrome binds its debug endpoint to 127.0.0.1 only.
303
361
  const probe = await axios_1.default.get(`http://127.0.0.1:${port}/json/version`, { timeout: 800 });
304
- endpointUp = probe.status === 200;
362
+ if (probe.status === 200) {
363
+ workingPorts.push(port);
364
+ }
305
365
  }
306
366
  catch {
307
367
  // Port not listening — skip without logging noise
308
- continue;
309
- }
310
- if (!endpointUp)
311
- continue;
312
- log.info('browser-playwright: CDP — debug endpoint found, connecting', { port });
313
- let cdpBrowser = null;
314
- try {
315
- cdpBrowser = await playwright_core_1.default.chromium.connectOverCDP(`http://localhost:${port}`, {
316
- timeout: 5000,
317
- });
318
- let matchedPage = null;
319
- for (const context of cdpBrowser.contexts()) {
320
- for (const page of context.pages()) {
321
- if (page.url().startsWith(targetBase)) {
322
- matchedPage = page;
323
- break;
324
- }
325
- }
326
- if (matchedPage)
327
- break;
328
- }
329
- if (!matchedPage) {
330
- log.debug('browser-playwright: CDP — no tab found matching URL', { port, url });
331
- continue;
332
- }
333
- log.info('browser-playwright: CDP — tab found, extracting content', {
334
- port,
335
- tabUrl: matchedPage.url(),
336
- });
337
- try {
338
- await matchedPage.waitForFunction(() => (document.body?.innerText ?? '').trim().length > 200, { timeout: 5000 });
339
- }
340
- catch {
341
- // Best-effort — extract whatever is rendered so far
342
- }
343
- const content = await matchedPage.evaluate(() => document.body.innerText ?? document.body.textContent ?? '');
344
- log.info('browser-playwright: CDP — content extracted', {
345
- port,
346
- contentLength: content.trim().length,
347
- });
348
- const trimmed = content.trim();
349
- return trimmed ? { content: trimmed, finalUrl: matchedPage.url() } : null;
350
- }
351
- catch (err) {
352
- log.warn('browser-playwright: CDP — connection failed', {
353
- port,
354
- error: err instanceof Error ? err.message.split('\n')[0] : String(err),
355
- });
356
- }
357
- finally {
358
- if (cdpBrowser) {
359
- try {
360
- await cdpBrowser.close();
361
- }
362
- catch { }
363
- }
364
368
  }
365
369
  }
366
- return null;
370
+ log.debug('browser-playwright: CDP — candidate port probe complete', {
371
+ candidateCount: candidatePorts.length,
372
+ workingCount: workingPorts.length,
373
+ workingPorts,
374
+ });
375
+ return workingPorts;
367
376
  }
368
377
  // ─── Public API ───────────────────────────────────────────────────────────────
369
378
  /**
@@ -713,7 +722,8 @@ async function fetchWithPlaywright(url, log) {
713
722
  url,
714
723
  browsers: [...browsersWithUrl],
715
724
  });
716
- const cdpResult = await fetchWithCDP(url, browsersWithUrl, log);
725
+ const workingPorts = await getWorkingCdpPorts(browsersWithUrl, log);
726
+ const cdpResult = await fetchWithCDP(url, workingPorts, log);
717
727
  if (cdpResult)
718
728
  return cdpResult.content;
719
729
  const liveContent = await fetchFromRunningBrowserTab(url, browsersWithUrl, log);
package/package.json CHANGED
@@ -4,7 +4,7 @@
4
4
  "access": "public",
5
5
  "registry": "https://registry.npmjs.org/"
6
6
  },
7
- "version": "1.0.30",
7
+ "version": "1.0.32",
8
8
  "description": "CLI for onboarding users to Omnikey AI and configuring OPENAI_API_KEY. Use Yarn for install/build.",
9
9
  "engines": {
10
10
  "node": ">=14.0.0",