npm - omnikey-cli - Versions diffs - 1.0.30 → 1.0.32 - Mend

omnikey-cli 1.0.30 → 1.0.32

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (6) hide show

package/backend-dist/agent/agentPrompts.js +36 -6
package/backend-dist/agent/agentServer.js +2 -2
package/backend-dist/config.js +3 -0
package/backend-dist/scheduledJobExecutor.js +2 -2
package/backend-dist/web-search/browser-playwright.js +79 -69
package/package.json +1 -1

package/backend-dist/agent/agentPrompts.js CHANGED Viewed

@@ -14,12 +14,39 @@ ${hasTaskInstructions
 - Priority order for conflicts: system rules > stored instructions > user input.
 **When to use shell scripts:**
-- Default to a \`<shell_script>\` for anything involving the machine, network, files, processes, env vars, or system state — never answer these from training data alone.
-- Scripts must be safe and read-only (inspection/diagnostics only). No installs, no data modification, no system changes, no sudo/admin privileges.
-- Use ${!isWindows ? 'bash (macOS/Linux)' : 'PowerShell'}. Scripts must be self-contained and ready to run as-is.
-- One comprehensive script per turn; wait for output only if you genuinely need it to proceed.
+- Default to a \`<shell_script>\` for anything involving the machine, network, files, processes, env vars, or system state — never answer from training data alone.
+- **Read vs write:** For open-ended/ambiguous requests run safe read-only commands first to understand the current state. When the user **explicitly** asks to create, update, delete, configure, or run something — do it directly; no need to ask for confirmation unless the scope is genuinely unclear.
+- **Package installation:** Install any package required to complete the task. Include the install step as its own phase so you can confirm it succeeded before building on it. Prefer project-local or user scope; avoid \`sudo\`/admin unless the user explicitly asks.
+${config_1.config.browserDebugPort !== undefined ? `- **Browser automation:** When the user explicitly asks to interact with a browser (click a button, fill a form, check a page, take a screenshot, etc.), generate \`<shell_script>\` blocks that use Node.js and \`playwright-core\` — one phase at a time (phasing rules below apply).
+  - **Phase 1 — ensure deps:** Check and install \`playwright-core\` if missing:
+    \`node -e "require('/tmp/playwright-runner/node_modules/playwright-core')" 2>/dev/null || npm install --prefix /tmp/playwright-runner playwright-core --silent\`
+  - **Phase 2 — connect & navigate:** Try CDP first; fall back to the existing debug profile. Reuse an open tab if the URL already matches — never open a duplicate.
+    \`\`\`js
+    const { chromium } = require('/tmp/playwright-runner/node_modules/playwright-core');
+    let browser, page;
+    try {
+      browser = await chromium.connectOverCDP('http://localhost:${config_1.config.browserDebugPort}');
+      const pages = browser.contexts().flatMap(c => c.pages());
+      page = pages.find(p => p.url().startsWith(TARGET_URL)) ?? null;
+      if (page) { await page.bringToFront(); }
+      else { page = await browser.contexts()[0].newPage(); await page.goto(TARGET_URL, { waitUntil: 'domcontentloaded', timeout: 15000 }); }
+    } catch {
+      const ctx = await chromium.launchPersistentContext('${config_1.config.browserDebugUserDataDir}', { executablePath: '${config_1.config.browserDebugExecutable}', headless: false });
+      browser = ctx;
+      page = ctx.pages().find(p => p.url().startsWith(TARGET_URL)) ?? await ctx.newPage();
+      if (!page.url().startsWith(TARGET_URL)) await page.goto(TARGET_URL, { waitUntil: 'domcontentloaded', timeout: 15000 });
+    }
+    \`\`\`
+  - **Phase 3+ — one action per script:** Each subsequent script reconnects the same way, finds the already-open tab, performs exactly one action (click / type / select / screenshot / read text), prints the result, then calls \`browser.disconnect()\` (CDP) or just exits (profile launch — leaves the window open).
+  - Always inline Node.js via a bash heredoc so the script is self-contained. Print structured output to stdout so it returns as \`TERMINAL OUTPUT:\`.` : ''}
+- Use ${!isWindows ? 'bash (macOS/Linux)' : 'PowerShell'}. Every script must be self-contained and ready to run as-is.
 - Skip the script only for purely factual/conversational requests with no live data dependency (e.g. "what is 2+2").
+**Script phasing — one phase per turn:**
+- Break every multi-step task into the smallest logical unit that can independently succeed or fail. Emit that script, wait for \`TERMINAL OUTPUT:\`, assess the result, then write the next script. Never combine phases that have independent failure modes into a single block — a mid-script failure loses all context for recovery.
+- Natural phase boundaries: **(1)** check / install dependencies → **(2)** inspect / probe current state → **(3)** make one targeted change → **(4)** verify the change took effect. Add a boundary wherever a failure would require a different next step than a success.
+- Single-step read-only queries ("list files", "show env") need no splitting — one script is fine.
 **When to use web tools:**
 - Use the built-in \`web_fetch\` tool when the user provides a URL that must be retrieved.
 - Use the built-in \`web_search\` tool when the user asks to search online, or when current information (prices, docs, recent events) is needed.
@@ -31,8 +58,11 @@ ${hasTaskInstructions
 - After the tool call returns, provide a \`<final_answer>\` that includes the saved file path.
 **Incoming message tags:**
-- \`TERMINAL OUTPUT:\` — stdout/stderr from a prior script. Analyze it immediately and respond with EITHER a follow-up \`<shell_script>\` (if more data is needed) OR a \`<final_answer>\` (if you have enough to conclude). You MUST pick one — never respond with plain text.
-- \`COMMAND ERROR:\` — script failed. Diagnose and emit a corrected \`<shell_script>\` or explain in \`<final_answer>\`.
+- \`TERMINAL OUTPUT:\` — output from the last script. You MUST assess it before proceeding:
+  - Phase succeeded → emit the **next phase** as a new \`<shell_script>\`, or \`<final_answer>\` if the task is complete.
+  - Phase failed or produced unexpected output → emit a targeted corrective \`<shell_script>\` that fixes only what failed. Do not restart from scratch unless the failure is fundamental.
+  Never skip assessment — never assume the previous phase succeeded without reading its output.
+- \`COMMAND ERROR:\` — script exited non-zero. Diagnose the specific line that failed, then emit a corrected \`<shell_script>\` scoped to that failure.
 - No prefix — direct user message; treat as the primary request.
 **Response format — every response must be exactly one of:**

package/backend-dist/agent/agentServer.js CHANGED Viewed

@@ -84,7 +84,7 @@ async function runToolLoop(initialResult, session, sessionId, send, log, tools,
                     content: `Generating image: "${prompt.slice(0, 100)}${prompt.length > 100 ? '...' : ''}"`,
                     is_terminal_output: false,
                     is_error: false,
-                    is_web_call: false,
+                    is_web_call: true,
                 });
                 const toolResult = await (0, imageTool_1.executeImageGenerationTool)(args, log);
                 log.info('Tool call completed', {
@@ -164,7 +164,7 @@ const aiModel = (0, ai_client_1.getDefaultModel)(config_1.config.aiProvider, 'sm
 // In-memory cache: sessionId -> live SessionState. Hydrated from DB on first
 // access and written back after each turn so restarts resume correctly.
 const sessionMessages = new Map();
-const MAX_TURNS = 10;
+const MAX_TURNS = 20;
 // ─── DB helpers ───────────────────────────────────────────────────────────────
 async function persistSessionToDB(sessionId, state) {
     try {

package/backend-dist/config.js CHANGED Viewed

@@ -101,6 +101,9 @@ exports.config = {
         const n = parseInt(raw, 10);
         return Number.isNaN(n) ? undefined : n;
     })(),
+    browserDebugBrowserName: getEnv('BROWSER_DEBUG_BROWSER_NAME', false),
+    browserDebugExecutable: getEnv('BROWSER_DEBUG_EXECUTABLE', false),
+    browserDebugUserDataDir: getEnv('BROWSER_DEBUG_USER_DATA_DIR', false),
     // GCS download-count tracking (both must be set to enable counting)
     gcsBucketName: getEnv('GCS_BUCKET_NAME', false),
     gcsDownloadCountObject: getEnv('GCS_DOWNLOAD_COUNT_OBJECT', false),

package/backend-dist/scheduledJobExecutor.js CHANGED Viewed

@@ -25,7 +25,7 @@ const FINAL_ANSWER_RE = /<final_answer>/;
 const JOB_TIMEOUT_MS = 10 * 60 * 1000;
 // Cron jobs get more turns than interactive sessions so multi-step tasks
 // (web research → shell commands → final answer) can complete unattended.
-const MAX_CRON_TURNS = 20;
+const MAX_CRON_TURNS = 30;
 function computeNextRunAt(cronExpression, runAt) {
     if (cronExpression) {
         try {
@@ -158,7 +158,7 @@ function runCronJob(job, subscription, sessionId) {
             sender: 'user',
             content: job.prompt,
             platform: job.platform ?? undefined,
-        }, send, logger_1.logger, { maxTurns: MAX_CRON_TURNS }).catch((err) => settle(err instanceof Error ? err : new Error(String(err))));
+        }, send, logger_1.logger, { maxTurns: MAX_CRON_TURNS, isCronJob: true }).catch((err) => settle(err instanceof Error ? err : new Error(String(err))));
     });
 }
 async function executeJob(job) {

package/backend-dist/web-search/browser-playwright.js CHANGED Viewed

@@ -212,15 +212,74 @@ function getRunningBrowserNames() {
     }
     return running;
 }
-// ─── Strategy -1: CDP via DevToolsActivePort ─────────────────────────────────
-//
-// When Chrome is launched with --remote-debugging-port (or --remote-debugging-port=0
-// to let it pick a free port), it writes a DevToolsActivePort file to the user data
-// directory containing the actual port. Connecting via CDP gives us direct access
-// to the live, JS-rendered tab content without AppleScript permissions or cookie
-// decryption. This is the fastest and most reliable path when available.
-async function fetchWithCDP(url, browsersWithUrl, log) {
+/**
+ * ─── Strategy -1: CDP via DevToolsActivePort ─────────────────────────────────
+ * When Chrome is launched with --remote-debugging-port (or --remote-debugging-port=0
+ * to let it pick a free port), it writes a DevToolsActivePort file to the user data
+ * directory containing the actual port. Connecting via CDP gives us direct access
+ * to the live, JS-rendered tab content without AppleScript permissions or cookie
+ * decryption. This is the fastest and most reliable path when available.
+ */
+async function fetchWithCDP(url, workingPorts, log) {
     const targetBase = url.split('?')[0]; // strip query for prefix match
+    for (const port of workingPorts) {
+        log.info('browser-playwright: CDP — debug endpoint found, connecting', { port });
+        let cdpBrowser = null;
+        try {
+            cdpBrowser = await playwright_core_1.default.chromium.connectOverCDP(`http://localhost:${port}`, {
+                timeout: 5000,
+            });
+            let matchedPage = null;
+            for (const context of cdpBrowser.contexts()) {
+                for (const page of context.pages()) {
+                    if (page.url().startsWith(targetBase)) {
+                        matchedPage = page;
+                        break;
+                    }
+                }
+                if (matchedPage)
+                    break;
+            }
+            if (!matchedPage) {
+                log.debug('browser-playwright: CDP — no tab found matching URL', { port, url });
+                continue;
+            }
+            log.info('browser-playwright: CDP — tab found, extracting content', {
+                port,
+                tabUrl: matchedPage.url(),
+            });
+            try {
+                await matchedPage.waitForFunction(() => (document.body?.innerText ?? '').trim().length > 200, { timeout: 5000 });
+            }
+            catch {
+                // Best-effort — extract whatever is rendered so far
+            }
+            const content = await matchedPage.evaluate(() => document.body.innerText ?? document.body.textContent ?? '');
+            log.info('browser-playwright: CDP — content extracted', {
+                port,
+                contentLength: content.trim().length,
+            });
+            const trimmed = content.trim();
+            return trimmed ? { content: trimmed, finalUrl: matchedPage.url() } : null;
+        }
+        catch (err) {
+            log.warn('browser-playwright: CDP — connection failed', {
+                port,
+                error: err instanceof Error ? err.message.split('\n')[0] : String(err),
+            });
+        }
+        finally {
+            if (cdpBrowser) {
+                try {
+                    await cdpBrowser.close();
+                }
+                catch { }
+            }
+        }
+    }
+    return null;
+}
+async function getWorkingCdpPorts(browsersWithUrl, log) {
     // Collect candidate ports:
     //   1. DevToolsActivePort file (written when Chrome was started with --remote-debugging-port)
     //   2. Well-known default ports developers commonly use
@@ -294,76 +353,26 @@ async function fetchWithCDP(url, browsersWithUrl, log) {
         candidatePorts.splice(candidatePorts.indexOf(config_1.config.browserDebugPort), 1);
         candidatePorts.unshift(config_1.config.browserDebugPort);
     }
+    const workingPorts = [];
     for (const port of candidatePorts) {
-        // Quick HTTP probe: /json/version returns immediately if the debug endpoint is up.
-        let endpointUp = false;
         try {
             // Use 127.0.0.1 explicitly — on Windows, `localhost` may resolve to ::1
             // while Chrome binds its debug endpoint to 127.0.0.1 only.
             const probe = await axios_1.default.get(`http://127.0.0.1:${port}/json/version`, { timeout: 800 });
-            endpointUp = probe.status === 200;
+            if (probe.status === 200) {
+                workingPorts.push(port);
+            }
         }
         catch {
             // Port not listening — skip without logging noise
-            continue;
-        }
-        if (!endpointUp)
-            continue;
-        log.info('browser-playwright: CDP — debug endpoint found, connecting', { port });
-        let cdpBrowser = null;
-        try {
-            cdpBrowser = await playwright_core_1.default.chromium.connectOverCDP(`http://localhost:${port}`, {
-                timeout: 5000,
-            });
-            let matchedPage = null;
-            for (const context of cdpBrowser.contexts()) {
-                for (const page of context.pages()) {
-                    if (page.url().startsWith(targetBase)) {
-                        matchedPage = page;
-                        break;
-                    }
-                }
-                if (matchedPage)
-                    break;
-            }
-            if (!matchedPage) {
-                log.debug('browser-playwright: CDP — no tab found matching URL', { port, url });
-                continue;
-            }
-            log.info('browser-playwright: CDP — tab found, extracting content', {
-                port,
-                tabUrl: matchedPage.url(),
-            });
-            try {
-                await matchedPage.waitForFunction(() => (document.body?.innerText ?? '').trim().length > 200, { timeout: 5000 });
-            }
-            catch {
-                // Best-effort — extract whatever is rendered so far
-            }
-            const content = await matchedPage.evaluate(() => document.body.innerText ?? document.body.textContent ?? '');
-            log.info('browser-playwright: CDP — content extracted', {
-                port,
-                contentLength: content.trim().length,
-            });
-            const trimmed = content.trim();
-            return trimmed ? { content: trimmed, finalUrl: matchedPage.url() } : null;
-        }
-        catch (err) {
-            log.warn('browser-playwright: CDP — connection failed', {
-                port,
-                error: err instanceof Error ? err.message.split('\n')[0] : String(err),
-            });
-        }
-        finally {
-            if (cdpBrowser) {
-                try {
-                    await cdpBrowser.close();
-                }
-                catch { }
-            }
         }
     }
-    return null;
+    log.debug('browser-playwright: CDP — candidate port probe complete', {
+        candidateCount: candidatePorts.length,
+        workingCount: workingPorts.length,
+        workingPorts,
+    });
+    return workingPorts;
 }
 // ─── Public API ───────────────────────────────────────────────────────────────
 /**
@@ -713,7 +722,8 @@ async function fetchWithPlaywright(url, log) {
         url,
         browsers: [...browsersWithUrl],
     });
-    const cdpResult = await fetchWithCDP(url, browsersWithUrl, log);
+    const workingPorts = await getWorkingCdpPorts(browsersWithUrl, log);
+    const cdpResult = await fetchWithCDP(url, workingPorts, log);
     if (cdpResult)
         return cdpResult.content;
     const liveContent = await fetchFromRunningBrowserTab(url, browsersWithUrl, log);

package/package.json CHANGED Viewed

@@ -4,7 +4,7 @@
     "access": "public",
     "registry": "https://registry.npmjs.org/"
   },
-  "version": "1.0.30",
+  "version": "1.0.32",
   "description": "CLI for onboarding users to Omnikey AI and configuring OPENAI_API_KEY. Use Yarn for install/build.",
   "engines": {
     "node": ">=14.0.0",