alvin-bot 4.8.8 β†’ 4.8.9

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/CHANGELOG.md CHANGED
@@ -2,6 +2,34 @@
2
2
 
3
3
  All notable changes to Alvin Bot are documented here.
4
4
 
5
+ ## [4.8.9] β€” 2026-04-11
6
+
7
+ ### πŸ› Browser automation: dead `browse-server.cjs` path removed, 3-tier router now the source of truth
8
+
9
+ The `browse` skill used to instruct the agent to start `node scripts/browse-server.cjs` on port 3800 for every browser task. That file was deleted in an earlier cleanup (see `20283c9` for the original 577-line version β€” now gone), but `skills/browse/SKILL.md` was never updated. Result: any browser-related user message on Telegram β€” or any cron job that hit the skill β€” got a system-prompt injection telling it to call a gateway that didn't exist, producing half-failed runs like the "Daily Job Alert" cron that couldn't load LinkedIn or StepStone.
10
+
11
+ **What changed:**
12
+
13
+ - **`skills/browse/SKILL.md` β€” full rewrite.** Now documents the hub 3-tier router at `~/.claude/hub/SCRIPTS/browser.sh`:
14
+ - **Tier 0** β€” WebFetch / `curl` for static pages and APIs
15
+ - **Tier 1** β€” `browser.sh stealth <url>` (Playwright + stealth plugin, headless, Cloudflare-masking)
16
+ - **Tier 2** β€” `browser.sh cdp {start|goto|shot|tabs|stop}` (real Chrome with persistent profile at `~/.claude/hub/BROWSER/profile/`, login cookies survive restarts)
17
+ - **Tier 3** β€” Claude-in-Chrome extension via MCP tools (interactive CLI only)
18
+ - Explicit escalation ladder (WebFetch β†’ stealth β†’ CDP β†’ ask Ali to log in) and a `NIEMALS browse-server.cjs nutzen` anti-rule.
19
+ - Concrete working targets (StepStone βœ…, Michael Page βœ…, LinkedIn βœ… with login, Indeed ❌) so the agent knows what to try where.
20
+
21
+ - **`src/services/browser-manager.ts` β€” hardened fallback chain.** The multi-strategy manager already had the right *shape* (`gateway β†’ cdp β†’ hub-stealth β†’ cli`) but several ops silently broke or hung:
22
+ - **`gatewayRequest` now has a 15 s timeout** (`req.destroy` on elapse). Previously a hung gateway would wedge the caller forever.
23
+ - **CDP fallback for interactive ops.** `click`, `fill`, `type`, `press`, `scroll`, `evaluate`, `info`, and `getTree` used to hard-throw `"requires gateway"` when `browse-server.cjs` wasn't running. They now try the gateway first, then a short-lived `chromium.connectOverCDP()` via a new `withCdpPage()` helper that reuses Ali's live Chrome on port 9222. Refs are interpreted as CSS selectors when gateway is absent.
24
+ - **Explicit PNG extension** on auto-generated screenshot filenames (`shot_<ts>.png`) so Playwright's format inference is unambiguous.
25
+ - **Better error messages** β€” every "needs interactive" throw now includes the exact command to start CDP Chrome (`~/.claude/hub/SCRIPTS/browser.sh cdp start headless`).
26
+
27
+ - **`src/paths.ts` β€” `HUB_BROWSER_SH` constant.** New absolute path to `~/.claude/hub/SCRIPTS/browser.sh` so the manager can shell out without hard-coding `os.homedir()` inline.
28
+
29
+ **Why this matters:** `browser-manager.ts` is still not wired into any bot code path (it's future-proofing), so the production fix for user-interactive flows is `SKILL.md`. The manager hardening ensures that when it does eventually get wired into a sub-agent tool, it won't hang on missing gateways or lose all interactive capability when only CDP is available.
30
+
31
+ **Testing:** Tier 1 stealth end-to-end against `stepstone.de/jobs/it-delivery-director` β†’ 1.2 MB HTML, title parsed. Module-level integration test: `navigate('https://example.com')` via auto-selected hub-stealth β†’ correct title/URL. `resolveStrategy('gateway')` β†’ cascades to CDP with visible warning. `info()` via CDP fallback β†’ returns live Chrome state without throwing. Skills reload picks up the new SKILL.md (5977 chars), `matchSkills("browse linkedin")` hits the browse skill, `buildSkillContext("open stepstone.de")` injects the 3-tier guidance block.
32
+
5
33
  ## [4.8.8] β€” 2026-04-11
6
34
 
7
35
  ### ✨ Unlimited sub-agent & cron timeouts (user-configurable)
package/dist/paths.js CHANGED
@@ -86,6 +86,8 @@ export const AGENTS_FILE = resolve(DATA_DIR, "AGENTS.md");
86
86
  export const HOOKS_DIR = resolve(DATA_DIR, "hooks");
87
87
  /** scripts/browse-server.cjs β€” HTTP gateway for persistent browser sessions */
88
88
  export const BROWSE_SERVER_SCRIPT = resolve(BOT_ROOT, "scripts", "browse-server.cjs");
89
+ /** ~/.claude/hub/SCRIPTS/browser.sh β€” Hub 3-tier browser router (stealth, CDP, ext) */
90
+ export const HUB_BROWSER_SH = resolve(os.homedir(), ".claude", "hub", "SCRIPTS", "browser.sh");
89
91
  /** data/exec-allowlist.json β€” User-defined exec allowlist */
90
92
  export const EXEC_ALLOWLIST_FILE = resolve(DATA_DIR, "exec-allowlist.json");
91
93
  /** assets/ β€” User asset files (CVs, cover letters, legal docs, photos) */
@@ -1,34 +1,166 @@
1
1
  /**
2
- * Multi-Strategy Browser Manager
2
+ * Multi-Strategy Browser Manager β€” with automatic fallback chain.
3
3
  *
4
- * Auto-selects between three browser strategies:
5
- * - CLI: Headless Playwright, one-shot (screenshots, text extraction, PDF)
6
- * - Gateway: Persistent HTTP browser server (interactive browsing, form-filling)
7
- * - CDP: Attach to user's live Chrome via DevTools Protocol
4
+ * Strategy priority:
5
+ * 1. Gateway (browse-server.cjs HTTP server) β€” if script exists and is running
6
+ * 2. CDP (Chrome DevTools Protocol) β€” via hub browser.sh cdp, persistent cookies
7
+ * 3. Hub Stealth (Playwright + stealth plugin) β€” via hub browser.sh stealth
8
+ * 4. Raw CLI (bare Playwright) β€” last resort, easily blocked
9
+ *
10
+ * If a strategy is unavailable, we automatically cascade to the next one
11
+ * and log a warning so failures are visible, not silent.
8
12
  */
9
- import { spawn } from "child_process";
13
+ import { execSync, spawn } from "child_process";
10
14
  import http from "http";
11
15
  import fs from "fs";
12
16
  import { config } from "../config.js";
13
- import { BROWSE_SERVER_SCRIPT } from "../paths.js";
17
+ import { BROWSE_SERVER_SCRIPT, HUB_BROWSER_SH } from "../paths.js";
14
18
  import { screenshotUrl, extractText, generatePdf } from "./browser.js";
15
- /** Auto-select the best browser strategy for a task */
19
+ const CDP_PORT = 9222;
20
+ const EXEC_TIMEOUT = 60_000; // 60s for page loads via shell
21
+ // ── Logging ──────────────────────────────────────────────────────────
22
+ function log(msg) {
23
+ console.warn(`[browser-manager] ${msg}`);
24
+ }
25
+ // ── Availability Checks ──────────────────────────────────────────────
26
+ function isGatewayScriptPresent() {
27
+ return fs.existsSync(BROWSE_SERVER_SCRIPT);
28
+ }
29
+ async function isGatewayRunning() {
30
+ try {
31
+ const health = await gatewayRequest("/health");
32
+ return !!health?.ok;
33
+ }
34
+ catch {
35
+ return false;
36
+ }
37
+ }
38
+ function isHubBrowserAvailable() {
39
+ return fs.existsSync(HUB_BROWSER_SH);
40
+ }
41
+ async function isCDPAvailable() {
42
+ return new Promise((resolve) => {
43
+ const req = http.get(`http://127.0.0.1:${CDP_PORT}/json/version`, (res) => {
44
+ let data = "";
45
+ res.on("data", (chunk) => (data += chunk));
46
+ res.on("end", () => resolve(res.statusCode === 200));
47
+ });
48
+ req.on("error", () => resolve(false));
49
+ req.setTimeout(3000, () => {
50
+ req.destroy();
51
+ resolve(false);
52
+ });
53
+ });
54
+ }
55
+ // ── Strategy Selection with Fallback ─────────────────────────────────
56
+ /** Pick the preferred strategy based on task type */
16
57
  export function selectStrategy(task = {}) {
17
58
  if (task.useUserBrowser || config.cdpUrl)
18
59
  return "cdp";
19
60
  if (task.interactive || task.multiStep)
20
61
  return "gateway";
62
+ return "hub-stealth";
63
+ }
64
+ /**
65
+ * Resolve the preferred strategy to one that's actually available.
66
+ * Cascades: gateway β†’ cdp β†’ hub-stealth β†’ cli
67
+ */
68
+ export async function resolveStrategy(preferred) {
69
+ const chain = [];
70
+ // Build fallback chain starting from preferred
71
+ switch (preferred) {
72
+ case "gateway":
73
+ chain.push("gateway", "cdp", "hub-stealth", "cli");
74
+ break;
75
+ case "cdp":
76
+ chain.push("cdp", "hub-stealth", "cli");
77
+ break;
78
+ case "hub-stealth":
79
+ chain.push("hub-stealth", "cli");
80
+ break;
81
+ case "cli":
82
+ chain.push("cli");
83
+ break;
84
+ }
85
+ for (const strategy of chain) {
86
+ switch (strategy) {
87
+ case "gateway":
88
+ if (isGatewayScriptPresent() && (await isGatewayRunning()))
89
+ return "gateway";
90
+ if (!isGatewayScriptPresent()) {
91
+ log("Gateway unavailable: browse-server.cjs not found. Falling back.");
92
+ }
93
+ else {
94
+ log("Gateway not running. Falling back.");
95
+ }
96
+ break;
97
+ case "cdp":
98
+ if (await isCDPAvailable())
99
+ return "cdp";
100
+ // Try starting CDP via hub script
101
+ if (isHubBrowserAvailable()) {
102
+ try {
103
+ log("CDP Chrome not running β€” attempting to start via hub browser.sh...");
104
+ execSync(`"${HUB_BROWSER_SH}" cdp start headless`, {
105
+ stdio: "pipe",
106
+ timeout: 15_000,
107
+ });
108
+ // Give it a moment to spin up
109
+ await new Promise((r) => setTimeout(r, 3000));
110
+ if (await isCDPAvailable()) {
111
+ log("CDP Chrome started successfully.");
112
+ return "cdp";
113
+ }
114
+ }
115
+ catch (err) {
116
+ log(`Failed to start CDP Chrome: ${err.message}`);
117
+ }
118
+ }
119
+ log("CDP unavailable. Falling back.");
120
+ break;
121
+ case "hub-stealth":
122
+ if (isHubBrowserAvailable())
123
+ return "hub-stealth";
124
+ log("Hub browser.sh not found. Falling back to raw Playwright.");
125
+ break;
126
+ case "cli":
127
+ return "cli"; // Always available as last resort
128
+ }
129
+ }
21
130
  return "cli";
22
131
  }
23
- // ── Gateway Management ────────────────────────────────────────────────
132
+ function execHub(args) {
133
+ try {
134
+ const result = execSync(`"${HUB_BROWSER_SH}" ${args}`, {
135
+ stdio: "pipe",
136
+ timeout: EXEC_TIMEOUT,
137
+ env: { ...process.env, PATH: process.env.PATH },
138
+ });
139
+ const stdout = result.toString().trim();
140
+ // Try to parse as JSON (stealth outputs JSON)
141
+ try {
142
+ return JSON.parse(stdout);
143
+ }
144
+ catch {
145
+ // Not JSON β€” return as raw text
146
+ return { title: "", url: "", raw: stdout };
147
+ }
148
+ }
149
+ catch (err) {
150
+ const error = err;
151
+ log(`Hub script failed: ${error.stderr?.toString()?.trim() || error.message}`);
152
+ return null;
153
+ }
154
+ }
155
+ // ── Gateway Management ───────────────────────────────────────────────
24
156
  let gatewayProcess = null;
25
- async function gatewayRequest(path, params = {}) {
157
+ async function gatewayRequest(urlPath, params = {}, timeoutMs = 15_000) {
26
158
  const query = new URLSearchParams(params).toString();
27
- const url = `http://127.0.0.1:${config.browseServerPort}${path}${query ? "?" + query : ""}`;
159
+ const url = `http://127.0.0.1:${config.browseServerPort}${urlPath}${query ? "?" + query : ""}`;
28
160
  return new Promise((resolve, reject) => {
29
- http.get(url, (res) => {
161
+ const req = http.get(url, (res) => {
30
162
  let data = "";
31
- res.on("data", chunk => data += chunk);
163
+ res.on("data", (chunk) => (data += chunk));
32
164
  res.on("end", () => {
33
165
  try {
34
166
  resolve(JSON.parse(data));
@@ -37,107 +169,270 @@ async function gatewayRequest(path, params = {}) {
37
169
  reject(new Error(`Invalid JSON from gateway: ${data.slice(0, 200)}`));
38
170
  }
39
171
  });
40
- }).on("error", reject);
172
+ });
173
+ req.on("error", reject);
174
+ req.setTimeout(timeoutMs, () => {
175
+ req.destroy(new Error(`Gateway request timed out after ${timeoutMs}ms: ${urlPath}`));
176
+ });
41
177
  });
42
178
  }
43
179
  async function ensureGateway() {
44
180
  // Check if already running
45
- try {
46
- const health = await gatewayRequest("/health");
47
- if (health.ok)
48
- return true;
49
- }
50
- catch { /* not running */ }
51
- // Start it
52
- if (!fs.existsSync(BROWSE_SERVER_SCRIPT))
181
+ if (await isGatewayRunning())
182
+ return true;
183
+ // Try to start it
184
+ if (!isGatewayScriptPresent()) {
185
+ log("Cannot start gateway: browse-server.cjs not found.");
53
186
  return false;
187
+ }
54
188
  gatewayProcess = spawn("node", [BROWSE_SERVER_SCRIPT, String(config.browseServerPort)], {
55
189
  stdio: "pipe",
56
190
  detached: false,
57
191
  });
58
- gatewayProcess.on("exit", () => { gatewayProcess = null; });
192
+ gatewayProcess.on("exit", () => {
193
+ gatewayProcess = null;
194
+ });
59
195
  // Wait for startup (max 10s)
60
196
  for (let i = 0; i < 20; i++) {
61
- await new Promise(r => setTimeout(r, 500));
62
- try {
63
- const health = await gatewayRequest("/health");
64
- if (health.ok)
65
- return true;
66
- }
67
- catch { /* still starting */ }
197
+ await new Promise((r) => setTimeout(r, 500));
198
+ if (await isGatewayRunning())
199
+ return true;
68
200
  }
201
+ log("Gateway failed to start within 10s.");
69
202
  return false;
70
203
  }
71
- // ── Unified Operations ────────────────────────────────────────────────
72
- /** Navigate to URL using best strategy */
204
+ // ── Unified Operations ───────────────────────────────────────────────
205
+ /** Navigate to URL using best available strategy */
73
206
  export async function navigate(url, task = {}) {
74
- const strategy = selectStrategy(task);
75
- if (strategy === "gateway") {
76
- await ensureGateway();
77
- return gatewayRequest("/navigate", { url });
78
- }
79
- if (strategy === "cdp") {
80
- // CDP: use playwright connectOverCDP
81
- const { chromium } = await import("playwright");
82
- const browser = await chromium.connectOverCDP(config.cdpUrl);
83
- const contexts = browser.contexts();
84
- const page = contexts[0]?.pages()[0] || await contexts[0]?.newPage() || await browser.newPage();
85
- await page.goto(url, { waitUntil: "networkidle", timeout: 30000 });
86
- const title = await page.title();
87
- return { title, url: page.url() };
88
- }
89
- // CLI: simple text extraction
90
- const text = await extractText(url);
91
- return { title: url, url, tree: [text.slice(0, 500)] };
207
+ const strategy = await resolveStrategy(selectStrategy(task));
208
+ log(`navigate(${url}) using strategy: ${strategy}`);
209
+ switch (strategy) {
210
+ case "gateway": {
211
+ await ensureGateway();
212
+ return gatewayRequest("/navigate", { url });
213
+ }
214
+ case "cdp": {
215
+ // Try hub CDP first
216
+ if (isHubBrowserAvailable()) {
217
+ const result = execHub(`cdp goto "${url}"`);
218
+ if (result && !result.error) {
219
+ return { title: result.title || "", url: result.url || url };
220
+ }
221
+ }
222
+ // Fallback: direct Playwright CDP
223
+ try {
224
+ const { chromium } = await import("playwright");
225
+ const browser = await chromium.connectOverCDP(config.cdpUrl || `http://127.0.0.1:${CDP_PORT}`);
226
+ const contexts = browser.contexts();
227
+ const page = contexts[0]?.pages()[0] || (await contexts[0]?.newPage()) || (await browser.newPage());
228
+ await page.goto(url, { waitUntil: "networkidle", timeout: 30000 });
229
+ const title = await page.title();
230
+ return { title, url: page.url() };
231
+ }
232
+ catch (err) {
233
+ log(`Direct CDP failed: ${err.message}`);
234
+ // Last resort: try stealth
235
+ if (isHubBrowserAvailable()) {
236
+ const stealthResult = execHub(`stealth "${url}"`);
237
+ if (stealthResult) {
238
+ return { title: stealthResult.title || "", url: stealthResult.url || url };
239
+ }
240
+ }
241
+ throw err;
242
+ }
243
+ }
244
+ case "hub-stealth": {
245
+ const result = execHub(`stealth "${url}"`);
246
+ if (result && !result.error) {
247
+ return { title: result.title || "", url: result.url || url };
248
+ }
249
+ // Fallback to raw CLI
250
+ log("Hub stealth failed, falling back to raw Playwright.");
251
+ const text = await extractText(url);
252
+ return { title: url, url, tree: [text.slice(0, 500)] };
253
+ }
254
+ case "cli":
255
+ default: {
256
+ const text = await extractText(url);
257
+ return { title: url, url, tree: [text.slice(0, 500)] };
258
+ }
259
+ }
92
260
  }
93
261
  /** Take a screenshot */
94
262
  export async function screenshot(url, options = {}) {
95
- const strategy = selectStrategy();
96
- if (strategy === "gateway") {
97
- await ensureGateway();
98
- if (url)
99
- await gatewayRequest("/navigate", { url });
100
- const result = await gatewayRequest("/screenshot", options.fullPage ? { full: "true" } : {});
101
- return result.path;
102
- }
103
- // CLI fallback
104
- return screenshotUrl(url, { fullPage: options.fullPage });
105
- }
106
- /** Get accessibility tree (gateway only) */
263
+ const strategy = await resolveStrategy(selectStrategy());
264
+ log(`screenshot(${url}) using strategy: ${strategy}`);
265
+ switch (strategy) {
266
+ case "gateway": {
267
+ await ensureGateway();
268
+ if (url)
269
+ await gatewayRequest("/navigate", { url });
270
+ const result = await gatewayRequest("/screenshot", options.fullPage ? { full: "true" } : {});
271
+ return result.path;
272
+ }
273
+ case "cdp": {
274
+ if (isHubBrowserAvailable()) {
275
+ const tmpName = `shot_${Date.now()}.png`;
276
+ const result = execHub(`cdp shot "${url}" ${tmpName}`);
277
+ if (result?.screenshot)
278
+ return result.screenshot;
279
+ }
280
+ // Fallback to raw Playwright
281
+ return screenshotUrl(url, { fullPage: options.fullPage });
282
+ }
283
+ case "hub-stealth": {
284
+ const tmpName = `shot_${Date.now()}.png`;
285
+ const result = execHub(`stealth "${url}" --screenshot=${tmpName}`);
286
+ if (result?.screenshot)
287
+ return result.screenshot;
288
+ // Fallback
289
+ return screenshotUrl(url, { fullPage: options.fullPage });
290
+ }
291
+ case "cli":
292
+ default:
293
+ return screenshotUrl(url, { fullPage: options.fullPage });
294
+ }
295
+ }
296
+ // ── CDP Direct-Playwright Helper ─────────────────────────────────────
297
+ // Used as fallback when the gateway isn't running but CDP Chrome is.
298
+ // Each call opens a short-lived CDP connection, operates on the newest
299
+ // existing page in the current context (keeps Chrome itself alive), and
300
+ // disconnects. Safe for sub-agents that need a single op at a time.
301
+ async function withCdpPage(fn) {
302
+ const { chromium } = await import("playwright");
303
+ const browser = await chromium.connectOverCDP(config.cdpUrl || `http://127.0.0.1:${CDP_PORT}`);
304
+ try {
305
+ const context = browser.contexts()[0];
306
+ if (!context)
307
+ throw new Error("No CDP contexts available β€” is Chrome CDP running?");
308
+ const pages = context.pages();
309
+ const page = pages[pages.length - 1] || (await context.newPage());
310
+ return await fn(page);
311
+ }
312
+ finally {
313
+ await browser.close(); // Closes CDP connection, not Chrome itself
314
+ }
315
+ }
316
+ const NEEDS_INTERACTIVE_HINT = "Start CDP Chrome: ~/.claude/hub/SCRIPTS/browser.sh cdp start headless";
317
+ /**
318
+ * Get accessibility tree (gateway preferred, CDP fallback returns outerHTML).
319
+ * The @eN ref model only exists in the gateway; under CDP we return a
320
+ * best-effort DOM snippet instead so callers can still see what's there.
321
+ */
107
322
  export async function getTree(limit = 100) {
108
- await ensureGateway();
109
- return gatewayRequest("/tree", { limit: String(limit) });
110
- }
111
- /** Click element by ref (gateway only) */
112
- export async function click(ref) {
113
- await ensureGateway();
114
- return gatewayRequest("/click", { ref });
115
- }
116
- /** Fill input (gateway only) */
117
- export async function fill(ref, value) {
118
- await ensureGateway();
119
- await gatewayRequest("/fill", { ref, value });
120
- }
121
- /** Type text (gateway only) */
122
- export async function type(ref, text) {
123
- await ensureGateway();
124
- await gatewayRequest("/type", { ref, text });
125
- }
126
- /** Press key (gateway only) */
127
- export async function press(key, ref) {
128
- await ensureGateway();
129
- await gatewayRequest("/press", ref ? { key, ref } : { key });
130
- }
131
- /** Scroll page (gateway only) */
323
+ if (await isGatewayRunning()) {
324
+ return gatewayRequest("/tree", { limit: String(limit) });
325
+ }
326
+ if (await isCDPAvailable()) {
327
+ return withCdpPage(async (page) => {
328
+ const elements = await page.$$eval("a, button, input, select, textarea, [role=button], [role=link]", (els, max) => els.slice(0, max).map((el, i) => {
329
+ const tag = el.tagName.toLowerCase();
330
+ const text = (el.textContent || "").trim().slice(0, 60);
331
+ const id = el.id ? `#${el.id}` : "";
332
+ const name = el.name
333
+ ? `[name=${el.name}]`
334
+ : "";
335
+ return `@e${i + 1} <${tag}${id}${name}> "${text}"`;
336
+ }), limit);
337
+ return { tree: elements, total: elements.length };
338
+ });
339
+ }
340
+ throw new Error(`[browser-manager] getTree requires gateway or CDP. ${NEEDS_INTERACTIVE_HINT}`);
341
+ }
342
+ /**
343
+ * Click an element. Accepts a gateway ref (@eN β†’ "eN") when gateway is
344
+ * running, or a CSS selector when only CDP is available.
345
+ */
346
+ export async function click(refOrSelector) {
347
+ if (await isGatewayRunning()) {
348
+ return gatewayRequest("/click", { ref: refOrSelector });
349
+ }
350
+ if (await isCDPAvailable()) {
351
+ return withCdpPage(async (page) => {
352
+ await page.click(refOrSelector, { timeout: 10_000 });
353
+ return { title: await page.title(), url: page.url() };
354
+ });
355
+ }
356
+ throw new Error(`[browser-manager] click() requires gateway or CDP. ${NEEDS_INTERACTIVE_HINT}`);
357
+ }
358
+ /** Fill an input. refOrSelector semantics match click(). */
359
+ export async function fill(refOrSelector, value) {
360
+ if (await isGatewayRunning()) {
361
+ await gatewayRequest("/fill", { ref: refOrSelector, value });
362
+ return;
363
+ }
364
+ if (await isCDPAvailable()) {
365
+ await withCdpPage(async (page) => {
366
+ await page.fill(refOrSelector, value, { timeout: 10_000 });
367
+ });
368
+ return;
369
+ }
370
+ throw new Error(`[browser-manager] fill() requires gateway or CDP. ${NEEDS_INTERACTIVE_HINT}`);
371
+ }
372
+ /** Type text character-by-character (for inputs that reject page.fill). */
373
+ export async function type(refOrSelector, text) {
374
+ if (await isGatewayRunning()) {
375
+ await gatewayRequest("/type", { ref: refOrSelector, text });
376
+ return;
377
+ }
378
+ if (await isCDPAvailable()) {
379
+ await withCdpPage(async (page) => {
380
+ await page.type(refOrSelector, text, { timeout: 10_000 });
381
+ });
382
+ return;
383
+ }
384
+ throw new Error(`[browser-manager] type() requires gateway or CDP. ${NEEDS_INTERACTIVE_HINT}`);
385
+ }
386
+ /** Press a keyboard key (page-level if no ref, element-level with ref). */
387
+ export async function press(key, refOrSelector) {
388
+ if (await isGatewayRunning()) {
389
+ await gatewayRequest("/press", refOrSelector ? { key, ref: refOrSelector } : { key });
390
+ return;
391
+ }
392
+ if (await isCDPAvailable()) {
393
+ await withCdpPage(async (page) => {
394
+ if (refOrSelector) {
395
+ await page.locator(refOrSelector).press(key, { timeout: 10_000 });
396
+ }
397
+ else {
398
+ await page.keyboard.press(key);
399
+ }
400
+ });
401
+ return;
402
+ }
403
+ throw new Error(`[browser-manager] press() requires gateway or CDP. ${NEEDS_INTERACTIVE_HINT}`);
404
+ }
405
+ /** Scroll page. CDP fallback uses window.scrollBy. */
132
406
  export async function scroll(direction, amount = 600) {
133
- await ensureGateway();
134
- return gatewayRequest("/scroll", { direction, amount: String(amount) });
407
+ if (await isGatewayRunning()) {
408
+ return gatewayRequest("/scroll", { direction, amount: String(amount) });
409
+ }
410
+ if (await isCDPAvailable()) {
411
+ return withCdpPage(async (page) => {
412
+ const delta = direction === "up" ? -amount :
413
+ direction === "top" ? -1e9 :
414
+ direction === "bottom" ? 1e9 :
415
+ amount;
416
+ await page.evaluate((d) => window.scrollBy(0, d), delta);
417
+ return { title: await page.title(), url: page.url() };
418
+ });
419
+ }
420
+ throw new Error(`[browser-manager] scroll() requires gateway or CDP. ${NEEDS_INTERACTIVE_HINT}`);
135
421
  }
136
- /** Evaluate JS (gateway only) */
422
+ /** Evaluate JS in the page context. */
137
423
  export async function evaluate(js) {
138
- await ensureGateway();
139
- const result = await gatewayRequest("/eval", { js });
140
- return result.result;
424
+ if (await isGatewayRunning()) {
425
+ const result = await gatewayRequest("/eval", { js });
426
+ return result.result;
427
+ }
428
+ if (await isCDPAvailable()) {
429
+ return withCdpPage(async (page) => {
430
+ // `page.evaluate(fn)` wraps a function β€” we need eval of a raw
431
+ // expression string, so wrap in an IIFE.
432
+ return page.evaluate(new Function(`return (${js})`));
433
+ });
434
+ }
435
+ throw new Error(`[browser-manager] evaluate() requires gateway or CDP. ${NEEDS_INTERACTIVE_HINT}`);
141
436
  }
142
437
  /** Generate PDF from URL */
143
438
  export async function pdf(url) {
@@ -154,8 +449,16 @@ export async function close() {
154
449
  gatewayProcess = null;
155
450
  }
156
451
  }
157
- /** Get current page info (gateway) */
452
+ /** Get current page info (gateway preferred, CDP fallback reads newest page). */
158
453
  export async function info() {
159
- await ensureGateway();
160
- return gatewayRequest("/info");
454
+ if (await isGatewayRunning()) {
455
+ return gatewayRequest("/info");
456
+ }
457
+ if (await isCDPAvailable()) {
458
+ return withCdpPage(async (page) => ({
459
+ title: await page.title(),
460
+ url: page.url(),
461
+ }));
462
+ }
463
+ throw new Error(`[browser-manager] info() requires gateway or CDP. ${NEEDS_INTERACTIVE_HINT}`);
161
464
  }
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "alvin-bot",
3
- "version": "4.8.8",
3
+ "version": "4.8.9",
4
4
  "description": "Alvin Bot β€” Your personal AI agent on Telegram, WhatsApp, Discord, Signal, and Web.",
5
5
  "type": "module",
6
6
  "main": "dist/index.js",
@@ -1,136 +1,161 @@
1
1
  ---
2
2
  name: Browser Automation
3
- description: Interactive browser control β€” navigate, click, fill forms, screenshot, test web apps
4
- triggers: browse, browser, test webapp, test app, test website, screenshot page, interact with, click on, fill form, visual test, qa test, check page, open page, test my app, browse to, open url, puppeteer, playwright, browser automation, test die seite, teste die app, schau dir an, ΓΆffne die seite, teste mal, visual check, check the ui, check the page
3
+ description: 3-tier browser control β€” stealth scraping, CDP with persistent cookies, visual oversight. Navigate, screenshot, extract text, interact with logged-in pages.
4
+ triggers: browse, browser, test webapp, test app, test website, screenshot page, interact with, click on, fill form, visual test, qa test, check page, open page, test my app, browse to, open url, puppeteer, playwright, browser automation, linkedin, stepstone, indeed, scrape, fetch page, crawl, teste die seite, teste die app, schau dir an, ΓΆffne die seite, teste mal, visual check, check the ui, check the page, webseite ΓΆffnen, seite abrufen
5
5
  priority: 8
6
6
  category: automation
7
7
  ---
8
8
 
9
- # Browser Automation β€” Playwright Interactive
9
+ # Browser Automation β€” 3-Tier Router
10
10
 
11
- ## Browser Strategies
11
+ Du hast drei Browser-Strategien plus WebFetch. **WΓ€hle die billigste passende Stufe** und eskaliere nur wenn nΓΆtig.
12
12
 
13
- Alvin Bot auto-selects the best browser approach:
13
+ ## Entscheidungsregel (in dieser Reihenfolge)
14
14
 
15
- | Strategy | When | How |
16
- |----------|------|-----|
17
- | **CLI** (default) | Simple screenshots, text extraction, PDF | Headless Playwright, one-shot |
18
- | **HTTP Gateway** | Interactive browsing, form-filling, QA testing | Persistent browser server on port 3800 |
19
- | **CDP** | Attach to user's Chrome (with login state) | Chrome DevTools Protocol via CDP_URL |
15
+ | Task | Tool | Warum |
16
+ |------|------|-------|
17
+ | Einzelne ΓΆffentliche Seite, nur Text | WebFetch oder `curl` | Am schnellsten, keine Browser-Engine |
18
+ | Γ–ffentliche Seite mit JS / Cloudflare | **Tier 1 Stealth** | Headless + Fingerprint-Masking |
19
+ | Login-pflichtige Seite (LinkedIn, Gmail, …) | **Tier 2 CDP** | Echtes Chrome, persistente Cookies |
20
+ | Komplexer Multi-Step-Flow, User soll zusehen | **Tier 3 Extension** | Visuelle Kontrolle |
20
21
 
21
- The gateway starts automatically when needed and shuts down after 5 min idle.
22
- For CDP: Launch Chrome with `--remote-debugging-port=9222` and set `CDP_URL=http://localhost:9222`.
22
+ **NIEMALS** `scripts/browse-server.cjs` nutzen β€” existiert nicht mehr. **NIEMALS** nacktes `node -e "const {chromium}…"` fΓΌr externe Seiten β€” wird sofort geblockt.
23
23
 
24
24
  ---
25
25
 
26
- You have a persistent Playwright browser server that gives you **eyes** and **hands** to interact with web pages. You can navigate, see screenshots, read the accessibility tree, click buttons, fill forms, and test running web apps.
26
+ ## Tier 0 β€” WebFetch / curl (schnellster Pfad)
27
27
 
28
- ## Quick Start
28
+ FΓΌr statische Seiten oder APIs, die keine JS-Rendering brauchen:
29
29
 
30
30
  ```bash
31
- # 1. Ensure server is running (auto-shuts down after 5 min idle)
32
- curl -s http://127.0.0.1:3800/health 2>/dev/null | grep -q '"ok":true' || \
33
- (BOT_DIR=$(node -e "console.log(require('path').resolve(require.resolve('alvin-bot/package.json'), '..'))" 2>/dev/null || echo ".") && cd "$BOT_DIR" && node scripts/browse-server.cjs &) && sleep 3
31
+ # Direkter curl
32
+ curl -sL -H "User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36" \
33
+ "https://www.michaelpage.de/jobs/it-director" | htmlq -t "h1, .job-title"
34
34
 
35
- # 2. Navigate to a page
36
- curl -s "http://127.0.0.1:3800/navigate?url=https://example.com" | jq
35
+ # Oder das WebFetch-Tool, wenn verfΓΌgbar (interpretiert Inhalt direkt)
36
+ ```
37
37
 
38
- # 3. Take a screenshot (view it with Read tool)
39
- SHOT=$(curl -s "http://127.0.0.1:3800/screenshot" | jq -r '.path')
40
- # Then use Read tool on $SHOT to see the image
38
+ Wenn das einen 403/Captcha gibt β†’ eskaliere auf Tier 1.
41
39
 
42
- # 4. Get interactive elements
43
- curl -s "http://127.0.0.1:3800/tree" | jq '.tree[]' -r
40
+ ---
44
41
 
45
- # 5. Click something
46
- curl -s "http://127.0.0.1:3800/click?ref=e5" | jq
47
- ```
42
+ ## Tier 1 β€” Playwright Stealth (headless, schnell, maskiert)
43
+
44
+ **Router-Script:** `~/.claude/hub/SCRIPTS/browser.sh`
45
+
46
+ ```bash
47
+ # Seite laden, JSON-Metadata zurΓΌck (title, url, html_length)
48
+ ~/.claude/hub/SCRIPTS/browser.sh stealth "https://www.stepstone.de/jobs/it-delivery"
48
49
 
49
- ## All Routes
50
-
51
- | Route | Params | What it does |
52
- |-------|--------|-------------|
53
- | `/navigate` | `url` | Open a URL, returns title + accessibility tree |
54
- | `/screenshot` | `full=true` (optional) | Take screenshot, returns file path |
55
- | `/tree` | `limit=N` (optional) | Get all interactive elements with @eN refs |
56
- | `/click` | `ref=eN` | Click element by ref |
57
- | `/fill` | `ref=eN`, `value=text` | Fill input field |
58
- | `/type` | `ref=eN`, `text=chars` | Type character by character (for special inputs) |
59
- | `/press` | `key=Enter`, `ref=eN` (opt) | Press keyboard key |
60
- | `/select` | `ref=eN`, `value=opt` | Select dropdown option |
61
- | `/hover` | `ref=eN` | Hover over element |
62
- | `/scroll` | `direction=down/up/top/bottom`, `amount=600` | Scroll page |
63
- | `/eval` | `js=expression` | Run JavaScript on page |
64
- | `/wait` | `ms=2000` or `selector=.class` | Wait for time or element |
65
- | `/viewport` | `device=mobile/tablet` or `width=W&height=H` | Change viewport |
66
- | `/cookies` | `set=[{...}]` (optional) | Get or set cookies |
67
- | `/back` | β€” | Browser back |
68
- | `/forward` | β€” | Browser forward |
69
- | `/reload` | β€” | Reload page |
70
- | `/network` | `limit=20` | Recent network requests |
71
- | `/info` | β€” | Current page info |
72
- | `/close` | β€” | Close browser + shutdown server |
73
- | `/health` | β€” | Server status check |
74
-
75
- ## Element Refs (@eN)
76
-
77
- The accessibility tree assigns **refs** like `@e1`, `@e2`, `@e3` to every interactive element (links, buttons, inputs, etc.). Use these refs for all interactions β€” they're more robust than CSS selectors.
78
-
79
- Example tree:
50
+ # Mit Screenshot (PNG in ~/.claude/hub/BROWSER/screenshots/)
51
+ ~/.claude/hub/SCRIPTS/browser.sh stealth "https://example.com" --screenshot=page.png
80
52
  ```
81
- @e1 <a href="/"> "Home"
82
- @e2 <a href="/dashboard"> "Dashboard"
83
- @e3 <input type="email" name="email" placeholder="Enter email">
84
- @e4 <input type="password" name="password" placeholder="Password">
85
- @e5 <button> "Sign In"
86
- @e6 <a href="/forgot"> "Forgot password?"
53
+
54
+ **Was du bekommst:** JSON mit `{title, url, html_length, screenshot}`. Der volle HTML liegt nicht in stdout β€” zum Parsen den `stealth.js` direkt als Modul importieren oder `/tmp/`-File lesen.
55
+
56
+ **Wann blockt das:** reCAPTCHA v3, aggressive Cloudflare, Login-Walls.
57
+
58
+ **Konkrete funktionierende Targets (Stand 2026):**
59
+ - StepStone (alle Job-Suchen) βœ…
60
+ - Michael Page βœ…
61
+ - Hays βœ…
62
+ - Γ–ffentliche Blog-Posts, News-Sites βœ…
63
+ - LinkedIn (ohne Login) ❌ β†’ Tier 2
64
+ - Indeed / Glassdoor ❌ (403 Scraping-Block) β†’ nur ΓΌber E-Mail-Alerts
65
+
66
+ ---
67
+
68
+ ## Tier 2 β€” Chrome CDP (persistent Profile, echte Cookies)
69
+
70
+ Echtes Chrome mit Profil unter `~/.claude/hub/BROWSER/profile/`. Login-Cookies fΓΌr LinkedIn/Gmail/etc. bleiben ΓΌber Sessions erhalten.
71
+
72
+ ```bash
73
+ # Einmal starten (checkt ob schon lΓ€uft)
74
+ ~/.claude/hub/SCRIPTS/browser.sh cdp start headless # headless β€” fΓΌr Cron/Daemon
75
+ ~/.claude/hub/SCRIPTS/browser.sh cdp start headful # sichtbar β€” wenn User zusehen soll
76
+
77
+ # Navigieren
78
+ ~/.claude/hub/SCRIPTS/browser.sh cdp goto "https://www.linkedin.com/jobs/search/?keywords=IT+Director"
79
+
80
+ # Screenshot
81
+ ~/.claude/hub/SCRIPTS/browser.sh cdp shot "https://www.linkedin.com/feed/" linkedin_feed.png
82
+
83
+ # Tabs auflisten
84
+ ~/.claude/hub/SCRIPTS/browser.sh cdp tabs
85
+
86
+ # Stoppen (meistens nicht nΓΆtig, Chrome lΓ€uft persistent)
87
+ ~/.claude/hub/SCRIPTS/browser.sh cdp stop
87
88
  ```
88
89
 
89
- To login:
90
+ **Login-Setup (einmalig):** Falls LinkedIn ausgeloggt ist, Ali per Telegram fragen:
91
+ > "Bitte einmal in Chrome (Hub-Profil) bei LinkedIn einloggen. Cookies bleiben dann dauerhaft erhalten."
92
+
93
+ Starten mit `cdp start headful` und Chrome ΓΆffnet sichtbar β†’ Ali loggt ein β†’ ab dann bleiben Cookies im Profil.
94
+
95
+ **Wie teste ich ob eingeloggt:** nach `cdp goto` die URL prΓΌfen β€” wenn `/authwall` oder `/login` im Pfad steht, bist du ausgeloggt.
96
+
97
+ ---
98
+
99
+ ## Tier 3 β€” Claude-in-Chrome Extension (visuelle Kontrolle)
100
+
101
+ Nur in interaktiven CLI-Sessions, nicht im Cron/Daemon.
102
+
90
103
  ```bash
91
- curl -s "http://127.0.0.1:3800/fill?ref=e3&value=user@example.com"
92
- curl -s "http://127.0.0.1:3800/fill?ref=e4&value=mypassword"
93
- curl -s "http://127.0.0.1:3800/click?ref=e5"
104
+ # Check ob Extension verbunden
105
+ ~/.claude/hub/SCRIPTS/browser.sh ext check
106
+
107
+ # Dann MCP-Tools ΓΌber ToolSearch laden:
108
+ # mcp__claude-in-chrome__tabs_context_mcp
109
+ # mcp__claude-in-chrome__navigate
110
+ # mcp__claude-in-chrome__computer
94
111
  ```
95
112
 
96
- ## Standard Workflow: Test a Web App
113
+ **Wann nutzen:** Drag&Drop, komplexe UI, User soll live zusehen und eingreifen kΓΆnnen.
97
114
 
98
- 1. **Start** the browse server if not running
99
- 2. **Navigate** to the app URL
100
- 3. **Screenshot** β†’ view with Read tool to see current state
101
- 4. **Tree** β†’ see all interactive elements
102
- 5. **Interact** (click, fill, press) using @eN refs
103
- 6. **Screenshot** again to verify the result
104
- 7. **Repeat** for each test step
105
- 8. **Report** findings to the user
106
- 9. **Close** when done
115
+ ---
107
116
 
108
- ## Mobile Testing
117
+ ## Eskalations-Regel (PFLICHT)
118
+
119
+ ```
120
+ Γ–ffentliche Text-Seite β†’ Tier 0 (WebFetch/curl)
121
+ ↓ 403/Cloudflare/leerer HTML?
122
+ Tier 1 (stealth) β†’ browser.sh stealth <url>
123
+ ↓ Captcha/Login-Wall?
124
+ Tier 2 (CDP) β†’ cdp start headless/headful + cdp goto <url>
125
+ ↓ Cookies fehlen?
126
+ Ali fragen: "Bitte einmal in Chrome bei X einloggen, dann kann ich weitermachen."
127
+ ```
128
+
129
+ **NIEMALS aufgeben mit "Browser funktioniert nicht"** β€” es gibt immer einen nΓ€chsten Schritt. Lieber ehrlich melden "Tier 1 blockt mit Captcha, versuche Tier 2" als "Failed to load".
130
+
131
+ ## Status-Checks
109
132
 
110
133
  ```bash
111
- # Switch to mobile viewport
112
- curl -s "http://127.0.0.1:3800/viewport?device=mobile"
113
- curl -s "http://127.0.0.1:3800/screenshot" | jq -r '.path'
114
- # Switch back to desktop
115
- curl -s "http://127.0.0.1:3800/viewport?width=1280&height=720"
134
+ # Übersicht aller Tiers + Health
135
+ ~/.claude/hub/SCRIPTS/browser.sh status
136
+
137
+ # Ist CDP Chrome gerade auf Port 9222?
138
+ curl -s http://127.0.0.1:9222/json/version | head -c 200
116
139
  ```
117
140
 
118
- ## Auth / Cookie Injection
141
+ ## Screenshot-Ausgabe ansehen
142
+
143
+ Screenshots werden gespeichert unter `~/.claude/hub/BROWSER/screenshots/` (relativ) oder dem absoluten Pfad, den du angibst. Read-Tool auf den Pfad zeigt dir das Bild direkt an.
144
+
145
+ ## Interaktive Ops (Klicken, Formular fΓΌllen)
146
+
147
+ FΓΌr einfache FΓ€lle: `cdp eval` mit JavaScript, das in der Seite ausgefΓΌhrt wird:
119
148
 
120
- For pages that need authentication:
121
149
  ```bash
122
- # Set cookies manually
123
- curl -s 'http://127.0.0.1:3800/cookies?set=[{"name":"session","value":"abc123","domain":"example.com","path":"/"}]'
124
- # Then navigate to the authenticated page
125
- curl -s "http://127.0.0.1:3800/navigate?url=https://example.com/dashboard"
150
+ ~/.claude/hub/SCRIPTS/browser.sh cdp eval "https://example.com/login" \
151
+ "document.querySelector('#username').value='test'; document.querySelector('#password').value='pw'; document.querySelector('form').submit();"
126
152
  ```
127
153
 
128
- ## Important Notes
154
+ FΓΌr komplexere Flows (sequentielles Klicken nach DOM-Updates) β†’ Tier 3 (Extension) nutzen.
155
+
156
+ ## Wichtige Notes
129
157
 
130
- - **Server auto-shuts down** after 5 min idle β€” restart if needed
131
- - **One page at a time** β€” navigation replaces the current page
132
- - **Screenshots** are saved to `/tmp/alvin-bot/browse/` β€” view with Read tool
133
- - **127.0.0.1 only** β€” not accessible from outside
134
- - **URL-encode** values with special chars: `value=hello%20world`
135
- - **Refs reset** on every navigation/click β€” always get fresh /tree after page changes
136
- - For **local dev servers**: use `http://localhost:PORT` as the URL
158
+ - **CDP-Profil-Konflikt:** Chrome kann `~/.claude/hub/BROWSER/profile/` nicht doppelt ΓΆffnen. Wenn Ali es lokal auf hatte, Port 9222 checken und `cdp stop` + `cdp start` machen.
159
+ - **Headless vs Headful:** Im Cron/Daemon (launchd) IMMER `headless` β€” sonst scheitert Chrome an fehlendem Display.
160
+ - **Nach Seiten-Navigation** (`cdp goto`) neue Tabs legt Playwright standardmÀßig an β€” reuseTab ist nicht exponiert. Das ist OK fΓΌr einzelne Scrapes, kann aber zu Tab-Explosion fΓΌhren. `cdp stop` & Neustart rΓ€umt auf.
161
+ - **Persistenz:** Cookies, LocalStorage, IndexedDB, alles in `~/.claude/hub/BROWSER/profile/`. Komplett persistiert zwischen Bot-Restarts.