PyPI - browser-ctl - Versions diffs - 0.2.4__tar.gz → 0.2.6__tar.gz - Mend

browser-ctl 0.2.4tar.gz → 0.2.6tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (24) hide show

{browser_ctl-0.2.4 → browser_ctl-0.2.6}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: browser-ctl
-Version: 0.2.4
+Version: 0.2.6
 Summary: Control your browser from the command line via a Chrome extension + WebSocket bridge
 Author-email: geb <853934146@qq.com>
 License-Expression: MIT
@@ -68,6 +68,7 @@ Tools like [browser-use](https://github.com/browser-use/browser-use), [Playwrigh
 | **Complex SDK integration** — requires importing libraries and writing async code | browser-use, Stagehand | Pure CLI with JSON output — any LLM can call `bctl click "button"` |
 | **Heavy dependencies** — Playwright alone pulls ~50 MB of packages + browser binary | Playwright, Puppeteer | CLI is stdlib-only; server needs only `aiohttp` |
 | **Token-inefficient for LLMs** — verbose API calls waste context window tokens | SDK-based tools | Concise commands: `bctl text h1` vs pages of boilerplate |
+| **Broken clicks on SPAs** — programmatic clicks get blocked by popup blockers | Puppeteer, Playwright | Intercepts `window.open()` and navigates via `chrome.tabs` — SPA-compatible |
 <br>
@@ -218,6 +219,10 @@ All `<sel>` arguments accept CSS selectors **or** element refs from `snapshot` (
 | `bctl ping` | Check server & extension status |
 | `bctl serve` | Start server in foreground |
 | `bctl stop` | Stop server |
+| `bctl setup` | Install extension to `~/.browser-ctl/extension/` + open Chrome extensions page |
+| `bctl setup cursor` | Install AI skill (`SKILL.md`) into Cursor IDE |
+| `bctl setup opencode` | Install AI skill into OpenCode |
+| `bctl setup <path>` | Install AI skill to a custom directory |
 <br>
@@ -379,9 +384,10 @@ Non-zero exit code on errors — works naturally with `set -e` and `&&` chains.
 | Component | Details |
 |-----------|---------|
-| **CLI** | Stdlib only, communicates via HTTP |
+| **CLI** | Stdlib only, raw-socket HTTP (zero heavy imports, ~5ms cold start) |
 | **Bridge Server** | Async relay (aiohttp), auto-daemonizes |
 | **Extension** | MV3 service worker, auto-reconnects via `chrome.alarms` |
+| **Click** | Three-phase: pointer events → MAIN-world click → `window.open()` interception for SPA compatibility |
 | **Eval** | Dual strategy: MAIN-world injection (fast) + CDP fallback (CSP-safe) |
 <br>

{browser_ctl-0.2.4 → browser_ctl-0.2.6}/README.md RENAMED Viewed

@@ -42,6 +42,7 @@ Tools like [browser-use](https://github.com/browser-use/browser-use), [Playwrigh
 | **Complex SDK integration** — requires importing libraries and writing async code | browser-use, Stagehand | Pure CLI with JSON output — any LLM can call `bctl click "button"` |
 | **Heavy dependencies** — Playwright alone pulls ~50 MB of packages + browser binary | Playwright, Puppeteer | CLI is stdlib-only; server needs only `aiohttp` |
 | **Token-inefficient for LLMs** — verbose API calls waste context window tokens | SDK-based tools | Concise commands: `bctl text h1` vs pages of boilerplate |
+| **Broken clicks on SPAs** — programmatic clicks get blocked by popup blockers | Puppeteer, Playwright | Intercepts `window.open()` and navigates via `chrome.tabs` — SPA-compatible |
 <br>
@@ -192,6 +193,10 @@ All `<sel>` arguments accept CSS selectors **or** element refs from `snapshot` (
 | `bctl ping` | Check server & extension status |
 | `bctl serve` | Start server in foreground |
 | `bctl stop` | Stop server |
+| `bctl setup` | Install extension to `~/.browser-ctl/extension/` + open Chrome extensions page |
+| `bctl setup cursor` | Install AI skill (`SKILL.md`) into Cursor IDE |
+| `bctl setup opencode` | Install AI skill into OpenCode |
+| `bctl setup <path>` | Install AI skill to a custom directory |
 <br>
@@ -353,9 +358,10 @@ Non-zero exit code on errors — works naturally with `set -e` and `&&` chains.
 | Component | Details |
 |-----------|---------|
-| **CLI** | Stdlib only, communicates via HTTP |
+| **CLI** | Stdlib only, raw-socket HTTP (zero heavy imports, ~5ms cold start) |
 | **Bridge Server** | Async relay (aiohttp), auto-daemonizes |
 | **Extension** | MV3 service worker, auto-reconnects via `chrome.alarms` |
+| **Click** | Three-phase: pointer events → MAIN-world click → `window.open()` interception for SPA compatibility |
 | **Eval** | Dual strategy: MAIN-world injection (fast) + CDP fallback (CSP-safe) |
 <br>

browser_ctl-0.2.6/browser_ctl/SKILL.md ADDED Viewed

@@ -0,0 +1,193 @@
+---
+name: browser-ctl
+description: Control the user's Chrome browser via CLI commands that return JSON. Use when the user asks to interact with a browser, navigate web pages, click elements, extract page content, take screenshots, download files, or perform any browser automation task.
+---
+# Browser-Ctl
+Control Chrome via CLI. All commands return JSON to stdout.
+## Prerequisites
+- Chrome with the Browser-Ctl extension loaded
+- Bridge server (auto-starts on first `bctl` command)
+## Always Start With
+```bash
+bctl ping
+```
+If extension is not connected, tell the user to check Chrome and the extension.
+## Core Principle: Text-First Page Perception
+**NEVER use `bctl screenshot` to understand page state.** Use text-based commands:
+1. `bctl status` — current URL + title
+2. `bctl text "<sel>"` — read visible text
+3. `bctl select "<sel>"` — discover page structure (tag, text, id, class, href, src, aria-label)
+4. `bctl snapshot` — list all interactive elements with refs (e0, e1, …)
+5. `bctl count "<sel>"` — check if elements exist and how many
+6. `bctl attr "<sel>" "<name>"` — get specific attributes
+Only use `bctl screenshot` when the user explicitly asks for a visual capture.
+## Commands
+### Navigation
+```
+bctl navigate <url>       Go to URL (aliases: nav, go; auto-prepends https://)
+bctl back                 Go back
+bctl forward              Go forward (alias: fwd)
+bctl reload               Reload page
+```
+### Interaction
+All `<sel>` accept CSS selectors or snapshot refs (e.g. `e5`).
+```
+bctl click <sel> [-i N] [-t text]    Click element; -t filters by visible text
+bctl dblclick <sel> [-i N] [-t text] Double-click
+bctl hover <sel> [-i N] [-t text]    Hover (triggers mouseover)
+bctl focus <sel> [-i N] [-t text]    Focus element
+bctl type <sel> <text>               Type text (replaces existing; React-compatible)
+bctl input-text <sel> <text> [--clear] [--delay ms]  Char-by-char (rich editors)
+bctl press <key>                     Press key: Enter, Escape, Tab, ArrowDown, etc.
+bctl check <sel> [-i N] [-t text]    Check checkbox/radio
+bctl uncheck <sel> [-i N] [-t text]  Uncheck checkbox
+bctl scroll <dir|sel> [n]            Scroll: up/down/top/bottom/<selector> [pixels]
+bctl select-option <sel> <val> [--text]  Select dropdown option (alias: sopt)
+bctl drag <src> [target] [--dx N --dy N] Drag element to target or by offset
+```
+### Query
+```
+bctl snapshot [--all]     List interactive elements as e0, e1, … (alias: snap)
+bctl text [sel]           Get text content (default: body)
+bctl html [sel]           Get innerHTML
+bctl attr <sel> [name]    Get attribute(s) [-i N for Nth element]
+bctl select <sel> [-l N]  List matching elements (alias: sel, limit default: 20)
+bctl count <sel>          Count matching elements
+bctl status               Current page URL and title
+bctl is-visible <sel>     Check if element is visible (returns rect)
+bctl get-value <sel>      Get form element value (input/select/textarea)
+```
+### JavaScript
+```
+bctl eval <code>          Execute JS in page context (MAIN world)
+```
+### Tabs
+```
+bctl tabs                 List all tabs (id, url, title, active)
+bctl tab <id>             Switch to tab
+bctl new-tab [url]        Open new tab
+bctl close-tab [id]       Close tab (default: active)
+```
+### Screenshot & Download
+```
+bctl screenshot [path]    Capture screenshot (alias: ss)
+bctl download <target>    Download file/image (alias: dl) [-o path] [-i N]
+bctl upload <sel> <files> Upload file(s) to <input type="file">
+```
+Downloads use `chrome.downloads` API and carry the browser's full auth session — use
+this instead of `curl` for sites requiring login.
+### Wait
+```
+bctl wait <sel|seconds>   Wait for element or sleep [timeout]
+```
+### Dialog
+```
+bctl dialog [accept|dismiss] [--text <val>]  Handle next alert/confirm/prompt
+```
+### Batch / Pipe
+```
+bctl pipe                 Read commands from stdin (one per line, JSONL output)
+bctl batch '<c1>' '<c2>'  Execute multiple commands in one call
+```
+Use `bctl pipe` for 2+ consecutive commands on the same page — merges into a single
+browser call, reducing overhead by ~90%.
+### Server
+```
+bctl ping                 Check server and extension status
+bctl serve                Start server (foreground)
+bctl stop                 Stop server
+```
+## Output Format
+All commands return JSON:
+- Success: `{"success": true, "data": {...}}`
+- Error: `{"success": false, "error": "..."}`
+Parse with `jq`: `bctl status | jq -r '.data.title'`
+## Best Practices
+### Snapshot-first Workflow
+Use `bctl snapshot` to get a numbered list of interactive elements, then operate by
+ref. This eliminates guessing CSS selectors on unfamiliar pages:
+```bash
+bctl snapshot                    # List all interactive elements
+bctl click e3                    # Click the 3rd element
+bctl type e7 "hello world"      # Type into the 7th element
+```
+### Click by Text (SPA-friendly)
+Use `-t` to filter by visible text — ideal for SPAs where class names are dynamic:
+```bash
+bctl click "button" -t "Submit"   # click button containing "Submit"
+```
+### SPA Video Sites (Tencent Video, Bilibili, etc.)
+`bctl click` intercepts `window.open()` calls from SPA frameworks and opens the
+target URL via `chrome.tabs.create`. Just click like a normal user:
+```bash
+bctl go "https://v.qq.com" && bctl wait 2
+bctl type "input" "西游记" && bctl press Enter && bctl wait 3
+bctl click ".root.list-item .poster-view" -i 0   # opens video in new tab
+```
+Fallback — extract content ID and navigate directly:
+```bash
+bctl attr ".root.list-item [dt-eid='poster']" "dt-params" | grep -o 'cid=[^&]*'
+bctl go "https://v.qq.com/x/cover/<cid>.html"
+```
+### Waiting Strategy
+- After navigation: `bctl wait 2-3` or `bctl wait "<selector>" 10`
+- After hover for overlay: `bctl wait 1`
+- AI generation: **poll** with `bctl wait 5 && bctl count "selector"` in a loop
+### Data Extraction
+Prefer `bctl select` over `bctl eval` — it's more reliable, works on all sites,
+and returns text/href/id/class/aria-label automatically.
+## Efficiency Tips
+1. **NEVER screenshot to "see" the page.** Use `status` + `text` + `select` + `snapshot`.
+2. **Use `count` before `click`** when you expect multiple matches.
+3. **Use `download` for authenticated resources** — never `curl` from sites behind login.
+4. **Use `hover` before clicking overlay buttons** — many UIs hide actions until hover.
+5. **Check `tabs` after tab-opening actions** — popups may switch the active tab.
+6. **Chain commands** with `&&`: `bctl go "https://example.com" && bctl wait 2 && bctl status`
+## Known Limitations
+- `eval` blocked by Trusted Types on some sites (Gemini, YouTube) — use `attr`/`select` instead
+- `screenshot` captures visible viewport only — scroll for full-page capture
+- Without `-i`, `click` always hits the FIRST match — use `count` to check first
+## Error Handling
+- `bctl ping` shows `"extension": false` → user must check Chrome and the extension
+- Selector fails → use `bctl select` or `bctl count` to debug
+- Dynamic content → use `bctl wait` before interacting

{browser_ctl-0.2.4 → browser_ctl-0.2.6}/browser_ctl/cli.py RENAMED Viewed

@@ -8,24 +8,23 @@ via HTTP POST to localhost:19876/command.
 from __future__ import annotations
 import argparse
-import base64
 import json
 import os
-import platform
 import shlex
-import shutil
-import subprocess
 import sys
-from browser_ctl.client import (
-	BCTL_HOME,
-	DEFAULT_PORT,
-	ensure_server,
-	send_batch,
-	send_command,
-	send_raw,
-	stop_server,
-)
+# Lazy-loaded — avoids importing urllib/subprocess on module load.
+# Populated on first use via _client().
+_client_mod = None
+def _client():
+	"""Lazy import of browser_ctl.client to avoid startup overhead."""
+	global _client_mod
+	if _client_mod is None:
+		from browser_ctl import client as _mod
+		_client_mod = _mod
+	return _client_mod
 SKILL_TARGETS = {
 	"cursor": os.path.join(os.path.expanduser("~"), ".cursor", "skills-cursor"),
@@ -68,7 +67,7 @@ CONTENT_SCRIPT_OPS = frozenset({
 def handle_pipe(args):
 	"""Read commands from stdin, execute them with smart batching, print JSONL."""
-	ensure_server()
+	_client().ensure_server_optimistic()
 	parser = build_parser()
 	# Collect all commands first
@@ -93,7 +92,7 @@ def handle_pipe(args):
 def handle_batch(args):
 	"""Execute multiple commands given as CLI arguments with smart batching."""
-	ensure_server()
+	_client().ensure_server_optimistic()
 	parser = build_parser()
 	pending: list[tuple[str, dict]] = []
@@ -130,7 +129,7 @@ def _execute_with_batching(commands: list[tuple[str, dict]], continue_on_error:
 			if len(batch) == 1:
 				# Single command — use normal endpoint (no overhead)
-				result = send_raw(batch[0]["action"], batch[0]["params"])
+				result = _client().send_raw(batch[0]["action"], batch[0]["params"])
 				print(json.dumps(result, ensure_ascii=False), flush=True)
 				if not result.get("success"):
 					had_error = True
@@ -138,7 +137,7 @@ def _execute_with_batching(commands: list[tuple[str, dict]], continue_on_error:
 						sys.exit(1)
 			else:
 				# Multiple consecutive content-script ops — use /batch
-				result = send_batch(batch)
+				result = _client().send_batch(batch)
 				if result.get("success") and "results" in result.get("data", {}):
 					for r in result["data"]["results"]:
 						print(json.dumps(r, ensure_ascii=False), flush=True)
@@ -154,7 +153,7 @@ def _execute_with_batching(commands: list[tuple[str, dict]], continue_on_error:
 						sys.exit(1)
 		else:
 			# Non-batchable command — send individually
-			result = send_raw(action, params)
+			result = _client().send_raw(action, params)
 			print(json.dumps(result, ensure_ascii=False), flush=True)
 			if not result.get("success"):
 				had_error = True
@@ -352,14 +351,15 @@ def build_parser() -> argparse.ArgumentParser:
 def handle_screenshot(args):
 	"""Screenshot needs special handling for file save."""
-	ensure_server()
-	result = send_raw("screenshot", {})
+	_client().ensure_server_optimistic()
+	result = _client().send_raw("screenshot", {})
 	if not result.get("success"):
 		print(json.dumps(result, ensure_ascii=False))
 		sys.exit(1)
 	if args.path:
 		# Save to file
+		import base64
 		b64 = result["data"]["base64"]
 		img_bytes = base64.b64decode(b64)
 		with open(args.path, "wb") as f:
@@ -377,7 +377,7 @@ def handle_download(args):
 	we send only the basename to the extension and then move the downloaded
 	file to the requested location.
 	"""
-	ensure_server()
+	_client().ensure_server_optimistic()
 	target = args.target
 	output = args.output
@@ -394,13 +394,14 @@ def handle_download(args):
 	else:
 		params["filename"] = output
-	result = send_raw("download", params)
+	result = _client().send_raw("download", params)
 	if not result.get("success"):
 		print(json.dumps(result, ensure_ascii=False))
 		sys.exit(1)
 	# Move downloaded file to the requested absolute path
 	if move_to and result.get("data", {}).get("filename"):
+		import shutil
 		src_path = result["data"]["filename"]
 		try:
 			shutil.move(src_path, move_to)
@@ -417,6 +418,7 @@ def handle_download(args):
 def handle_serve(args):
 	"""Run server in foreground."""
+	from browser_ctl.client import DEFAULT_PORT
 	os.execvp(sys.executable, [sys.executable, "-m", "browser_ctl.server", "--port", str(DEFAULT_PORT)])
@@ -464,6 +466,11 @@ def _get_package_version() -> str:
 def _install_extension() -> str | None:
 	"""Copy extension to ~/.browser-ctl/extension/ and try to open Chrome extensions page."""
+	import platform
+	import shutil
+	import subprocess
+	from browser_ctl.client import BCTL_HOME
 	src = _get_extension_source_dir()
 	if not src:
 		return None
@@ -513,6 +520,8 @@ def _install_extension() -> str | None:
 def _install_skill(target_dir: str) -> str:
 	"""Copy SKILL.md into <target_dir>/browser-ctl/."""
+	import shutil
 	src = os.path.join(os.path.dirname(os.path.abspath(__file__)), "SKILL.md")
 	if not os.path.isfile(src):
 		raise FileNotFoundError("SKILL.md not found in browser_ctl package.")
@@ -687,7 +696,7 @@ def main():
 		handle_serve(args)
 		return
 	if cmd == "stop":
-		stop_server()
+		_client().stop_server()
 		return
 	# Screenshot (special handling)
@@ -712,7 +721,7 @@ def main():
 	# Standard command: parse args, send to server
 	action, params = args_to_action_params(cmd, args)
-	send_command(action, params)
+	_client().send_command(action, params)
 if __name__ == "__main__":

{browser_ctl-0.2.4 → browser_ctl-0.2.6}/browser_ctl/client.py RENAMED Viewed

@@ -2,25 +2,93 @@
 Handles server lifecycle (start/stop/health) and command relay.
 Zero external dependencies (stdlib only).
+Uses raw sockets for minimal import overhead (~5ms vs ~30ms for urllib).
 """
 from __future__ import annotations
 import json
 import os
-import subprocess
+import socket
 import sys
-import tempfile
-import time
-import urllib.error
-import urllib.request
 DEFAULT_PORT = 19876
-SERVER_URL = f"http://127.0.0.1:{DEFAULT_PORT}"
-PID_FILE = os.path.join(tempfile.gettempdir(), f"bctl-{DEFAULT_PORT}.pid")
+_HOST = "127.0.0.1"
 BCTL_HOME = os.path.join(os.path.expanduser("~"), ".browser-ctl")
+def _pid_file() -> str:
+	import tempfile
+	return os.path.join(tempfile.gettempdir(), f"bctl-{DEFAULT_PORT}.pid")
+# ---------------------------------------------------------------------------
+# Lightweight HTTP via raw sockets (avoids importing urllib — saves ~25ms)
+# ---------------------------------------------------------------------------
+def _http_post(path: str, body: bytes, timeout: float = 35) -> dict:
+	"""Send HTTP POST to bridge server, return parsed JSON response."""
+	try:
+		sock = socket.create_connection((_HOST, DEFAULT_PORT), timeout=timeout)
+	except (ConnectionRefusedError, OSError):
+		return {"success": False, "error": "Cannot connect to server"}
+	try:
+		req = (
+			f"POST {path} HTTP/1.0\r\n"
+			f"Host: {_HOST}:{DEFAULT_PORT}\r\n"
+			f"Content-Type: application/json\r\n"
+			f"Content-Length: {len(body)}\r\n"
+			f"\r\n"
+		).encode("utf-8") + body
+		sock.sendall(req)
+		# Read response
+		chunks = []
+		while True:
+			chunk = sock.recv(65536)
+			if not chunk:
+				break
+			chunks.append(chunk)
+		data = b"".join(chunks)
+	finally:
+		sock.close()
+	# Parse HTTP response — skip headers, find JSON body
+	parts = data.split(b"\r\n\r\n", 1)
+	if len(parts) < 2:
+		return {"success": False, "error": "Invalid response from server"}
+	try:
+		return json.loads(parts[1].decode("utf-8"))
+	except (json.JSONDecodeError, UnicodeDecodeError):
+		return {"success": False, "error": "Invalid response from server"}
+def _http_get(path: str, timeout: float = 1) -> int:
+	"""Send HTTP GET, return status code (0 on failure)."""
+	try:
+		sock = socket.create_connection((_HOST, DEFAULT_PORT), timeout=timeout)
+	except (ConnectionRefusedError, OSError):
+		return 0
+	try:
+		req = (
+			f"GET {path} HTTP/1.0\r\n"
+			f"Host: {_HOST}:{DEFAULT_PORT}\r\n"
+			f"\r\n"
+		).encode("utf-8")
+		sock.sendall(req)
+		# Only need the status line
+		resp = sock.recv(1024)
+	finally:
+		sock.close()
+	try:
+		status_line = resp.split(b"\r\n", 1)[0]
+		return int(status_line.split(b" ", 2)[1])
+	except (IndexError, ValueError):
+		return 0
 # ---------------------------------------------------------------------------
 # Server management
 # ---------------------------------------------------------------------------
@@ -28,23 +96,17 @@ BCTL_HOME = os.path.join(os.path.expanduser("~"), ".browser-ctl")
 def is_server_running() -> bool:
 	"""Check if bridge server is running (PID exists AND HTTP health check passes)."""
-	if not os.path.exists(PID_FILE):
+	pid_file = _pid_file()
+	if not os.path.exists(pid_file):
 		return False
 	try:
-		with open(PID_FILE) as f:
+		with open(pid_file) as f:
 			pid = int(f.read().strip())
 		os.kill(pid, 0)  # Check process exists
 	except (OSError, ValueError):
 		return False
 	# Process exists — verify it is actually accepting HTTP connections.
-	# This avoids a race where the PID is still alive but the server is
-	# shutting down (port already closed).
-	try:
-		req = urllib.request.Request(f"{SERVER_URL}/health")
-		resp = urllib.request.urlopen(req, timeout=1)
-		return resp.status == 200
-	except Exception:
-		return False
+	return _http_get("/health") == 200
 def start_server() -> bool:
@@ -52,6 +114,7 @@ def start_server() -> bool:
 	if is_server_running():
 		return False
+	import subprocess
 	cmd = [sys.executable, "-m", "browser_ctl.server", "--port", str(DEFAULT_PORT), "--daemon"]
 	subprocess.Popen(
 		cmd,
@@ -61,15 +124,11 @@ def start_server() -> bool:
 	)
 	# Wait for server to become responsive
+	import time
 	for _ in range(60):  # 3 seconds max
 		time.sleep(0.05)
-		try:
-			req = urllib.request.Request(f"{SERVER_URL}/health")
-			resp = urllib.request.urlopen(req, timeout=0.5)
-			if resp.status == 200:
-				return True
-		except Exception:
-			pass
+		if _http_get("/health") == 200:
+			return True
 	print(json.dumps({"success": False, "error": "Failed to start bridge server"}))
 	sys.exit(1)
@@ -83,6 +142,7 @@ def stop_server():
 	result = send_raw("shutdown", {})
 	if result.get("success"):
 		# Wait briefly for server to fully stop and clean up PID file
+		import time
 		for _ in range(20):
 			time.sleep(0.05)
 			if not is_server_running():
@@ -106,24 +166,16 @@ def ensure_server():
 def send_raw(action: str, params: dict) -> dict:
 	"""Send command to bridge server, return parsed response."""
 	body = json.dumps({"action": action, "params": params}).encode("utf-8")
-	req = urllib.request.Request(
-		f"{SERVER_URL}/command",
-		data=body,
-		headers={"Content-Type": "application/json"},
-	)
-	try:
-		resp = urllib.request.urlopen(req, timeout=35)
-		return json.loads(resp.read().decode("utf-8"))
-	except urllib.error.URLError as e:
-		return {"success": False, "error": f"Cannot connect to server: {e}"}
-	except json.JSONDecodeError:
-		return {"success": False, "error": "Invalid response from server"}
+	return _http_post("/command", body)
 def send_command(action: str, params: dict):
-	"""Ensure server, send command, print JSON result."""
-	ensure_server()
+	"""Optimistic send: try command first, start server only on failure."""
 	result = send_raw(action, params)
+	if not result.get("success") and "Cannot connect" in result.get("error", ""):
+		# Server not running — start it and retry
+		start_server()
+		result = send_raw(action, params)
 	print(json.dumps(result, ensure_ascii=False))
 	if not result.get("success"):
 		sys.exit(1)
@@ -132,15 +184,11 @@ def send_command(action: str, params: dict):
 def send_batch(commands: list[dict]) -> dict:
 	"""Send multiple commands to /batch endpoint, return parsed response."""
 	body = json.dumps({"commands": commands}).encode("utf-8")
-	req = urllib.request.Request(
-		f"{SERVER_URL}/batch",
-		data=body,
-		headers={"Content-Type": "application/json"},
-	)
-	try:
-		resp = urllib.request.urlopen(req, timeout=120)
-		return json.loads(resp.read().decode("utf-8"))
-	except urllib.error.URLError as e:
-		return {"success": False, "error": f"Cannot connect to server: {e}"}
-	except json.JSONDecodeError:
-		return {"success": False, "error": "Invalid response from server"}
+	return _http_post("/batch", body, timeout=120)
+def ensure_server_optimistic() -> None:
+	"""Start server if not running. Optimistic — only checks on first call."""
+	result = send_raw("ping", {})
+	if not result.get("success") and "Cannot connect" in result.get("error", ""):
+		start_server()

{browser_ctl-0.2.4 → browser_ctl-0.2.6}/browser_ctl/extension/background.js RENAMED Viewed

@@ -143,9 +143,9 @@ async function handleAction(action, params) {
     case "reload":
       return await doReload();
-    // -- Interaction (content-script) --
+    // -- Interaction --
     case "click":
-      return await runInPage("click", params);
+      return await doClick(params);
     case "hover":
       return await runInPage("hover", params);
     case "type":
@@ -438,6 +438,128 @@ function waitForDownload(downloadId, timeoutMs = 30000) {
   });
 }
+// ---------------------------------------------------------------------------
+// Click — Three-phase approach for maximum SPA compatibility:
+//   Phase 1 (ISOLATED): Find element, scrollIntoView, dispatch pointer/mouse events
+//   Phase 2 (MAIN): Hook window.open to capture blocked popup URLs, then dispatch
+//                   click event. The click triggers the site's normal handler which
+//                   may call window.open() — but since isTrusted=false, Chrome blocks
+//                   the popup. Our hook captures the URL instead.
+//   Phase 3 (background): If window.open was intercepted, navigate via chrome.tabs.
+// ---------------------------------------------------------------------------
+async function doClick(params) {
+  const tab = await activeTab();
+  // Phase 1: Find element + dispatch pointer/mouse events (ISOLATED world)
+  const phase1 = await chrome.scripting.executeScript({
+    target: { tabId: tab.id },
+    func: (selector, index, textFilter) => {
+      function qs(sel, idx, tf) {
+        if (sel && /^e\d+$/.test(sel)) {
+          const el = document.querySelector(`[data-bctl-ref="${sel}"]`);
+          if (!el) throw new Error(`Ref not found: ${sel}`);
+          return el;
+        }
+        if (!sel) return document.body;
+        let els = Array.from(document.querySelectorAll(sel));
+        if (tf) {
+          const lc = tf.toLowerCase();
+          els = els.filter((e) => e.textContent && e.textContent.toLowerCase().includes(lc));
+        }
+        if (!els.length) throw new Error(`Element not found: ${sel}${tf ? ` (text: "${tf}")` : ""}`);
+        const i = idx ?? 0;
+        const actual = i < 0 ? els.length + i : i;
+        if (actual < 0 || actual >= els.length)
+          throw new Error(`Index ${i} out of range (0..${els.length - 1}) for: ${sel}`);
+        return els[actual];
+      }
+      try {
+        const el = qs(selector, index, textFilter);
+        el.scrollIntoView({ block: "center", behavior: "instant" });
+        const rect = el.getBoundingClientRect();
+        const cx = rect.left + rect.width / 2;
+        const cy = rect.top + rect.height / 2;
+        const mOpts = { bubbles: true, cancelable: true, clientX: cx, clientY: cy, button: 0 };
+        el.dispatchEvent(new PointerEvent("pointerdown", { ...mOpts, pointerId: 1 }));
+        el.dispatchEvent(new MouseEvent("mousedown", mOpts));
+        el.dispatchEvent(new PointerEvent("pointerup", { ...mOpts, pointerId: 1 }));
+        el.dispatchEvent(new MouseEvent("mouseup", mOpts));
+        el.setAttribute("data-bctl-click-target", "1");
+        const total = selector ? document.querySelectorAll(selector).length : 1;
+        return { success: true, cx, cy, total };
+      } catch (e) {
+        return { success: false, error: e.message };
+      }
+    },
+    args: [params.selector, params.index, params.text],
+  });
+  const info = phase1[0]?.result;
+  if (!info || !info.success) throw new Error(info?.error || "Failed to locate element");
+  // Phase 2a: Install window.open hook in MAIN world (persists across ticks)
+  await chrome.scripting.executeScript({
+    target: { tabId: tab.id },
+    func: () => {
+      window.__bctlOrigOpen = window.open;
+      window.__bctlCapturedUrl = null;
+      window.open = function (url) {
+        if (url && typeof url === "string" && url.startsWith("http")) {
+          window.__bctlCapturedUrl = url;
+        }
+        return null; // Block the popup — we'll navigate via chrome.tabs
+      };
+    },
+    world: "MAIN",
+  });
+  // Phase 2b: Dispatch click in MAIN world (site's async handler will call
+  // window.open on a later tick — our hook persists and captures it)
+  await chrome.scripting.executeScript({
+    target: { tabId: tab.id },
+    func: (cx, cy) => {
+      const el = document.querySelector("[data-bctl-click-target]");
+      if (el) {
+        el.removeAttribute("data-bctl-click-target");
+        const opts = { bubbles: true, cancelable: true, clientX: cx, clientY: cy, button: 0, view: window };
+        el.dispatchEvent(new MouseEvent("click", opts));
+      }
+    },
+    args: [info.cx, info.cy],
+    world: "MAIN",
+  });
+  // Phase 2c: Wait for async handlers, then read captured URL and restore
+  await new Promise((r) => setTimeout(r, 200));
+  const phase2c = await chrome.scripting.executeScript({
+    target: { tabId: tab.id },
+    func: () => {
+      const url = window.__bctlCapturedUrl;
+      window.open = window.__bctlOrigOpen;
+      delete window.__bctlCapturedUrl;
+      delete window.__bctlOrigOpen;
+      return { capturedUrl: url };
+    },
+    world: "MAIN",
+  });
+  const clickResult = phase2c[0]?.result;
+  // Phase 3: If window.open was intercepted, navigate via chrome.tabs
+  if (clickResult?.capturedUrl) {
+    await chrome.tabs.create({ url: clickResult.capturedUrl, active: true });
+  }
+  return {
+    clicked: params.selector || "body",
+    index: params.index ?? 0,
+    total: info.total,
+    text: params.text || null,
+  };
+}
 // ---------------------------------------------------------------------------
 // Press key (via debugger or content script)
 // ---------------------------------------------------------------------------
@@ -792,17 +914,20 @@ async function contentScriptHandler(commands) {
       case "click": {
         const el = qs(params.selector, params.index, params.text);
         el.scrollIntoView({ block: "center", behavior: "instant" });
-        // Dispatch full pointer/mouse sequence first for Vue/React SPA compatibility,
+        // Dispatch full pointer/mouse sequence for Vue/React SPA compatibility.
+        // We dispatch a MouseEvent("click") WITH coordinates first (for frameworks
+        // that use event delegation based on clientX/clientY, e.g. Tencent Video),
         // then call native el.click() which produces a trusted (isTrusted:true) event
-        // that sites like GitHub require.
+        // (for sites like GitHub that require trusted events).
         const rect = el.getBoundingClientRect();
         const cx = rect.left + rect.width / 2;
         const cy = rect.top + rect.height / 2;
-        const mOpts = { bubbles: true, cancelable: true, clientX: cx, clientY: cy, button: 0 };
+        const mOpts = { bubbles: true, cancelable: true, clientX: cx, clientY: cy, button: 0, view: window };
         el.dispatchEvent(new PointerEvent("pointerdown", { ...mOpts, pointerId: 1 }));
         el.dispatchEvent(new MouseEvent("mousedown", mOpts));
         el.dispatchEvent(new PointerEvent("pointerup", { ...mOpts, pointerId: 1 }));
         el.dispatchEvent(new MouseEvent("mouseup", mOpts));
+        el.dispatchEvent(new MouseEvent("click", mOpts));
         el.click();
         const total = params.selector ? document.querySelectorAll(params.selector).length : 1;
         return { clicked: params.selector || "body", index: params.index ?? 0, total, text: params.text || null };

{browser_ctl-0.2.4 → browser_ctl-0.2.6}/browser_ctl/extension/manifest.json RENAMED Viewed

@@ -1,7 +1,7 @@
 {
   "manifest_version": 3,
   "name": "Browser-Ctl",
-  "version": "0.2.4",
+  "version": "0.2.6",
   "description": "Developer tool for CLI-driven browser automation. Control Chrome via command-line — navigate, click, type, query DOM, capture screenshots, and download files, all through a local WebSocket bridge.",
   "permissions": [
     "tabs",

{browser_ctl-0.2.4 → browser_ctl-0.2.6}/browser_ctl/server.py RENAMED Viewed

@@ -189,13 +189,15 @@ async def health_handler(request: web.Request) -> web.Response:
 # Helpers
 # ---------------------------------------------------------------------------
-# Operations that run inside content scripts and can be batched.
-# Operations executed inside content scripts — can be batched into a single
+# Operations that run inside content scripts and can be batched into a single
 # chrome.scripting.executeScript call.  "eval" is excluded because it uses
 # MAIN-world script-tag injection + CDP debugger fallback.
 _CONTENT_SCRIPT_OPS = frozenset({
-	"click", "hover", "type", "press", "text", "html", "attr",
-	"select", "count", "scroll", "select-option", "drag", "wait",
+	"click", "dblclick", "hover", "focus", "type", "input-text",
+	"press", "check", "uncheck",
+	"text", "html", "attr", "select", "count", "snapshot",
+	"is-visible", "get-value",
+	"scroll", "select-option", "drag", "wait",
 })

{browser_ctl-0.2.4 → browser_ctl-0.2.6}/browser_ctl.egg-info/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: browser-ctl
-Version: 0.2.4
+Version: 0.2.6
 Summary: Control your browser from the command line via a Chrome extension + WebSocket bridge
 Author-email: geb <853934146@qq.com>
 License-Expression: MIT
@@ -68,6 +68,7 @@ Tools like [browser-use](https://github.com/browser-use/browser-use), [Playwrigh
 | **Complex SDK integration** — requires importing libraries and writing async code | browser-use, Stagehand | Pure CLI with JSON output — any LLM can call `bctl click "button"` |
 | **Heavy dependencies** — Playwright alone pulls ~50 MB of packages + browser binary | Playwright, Puppeteer | CLI is stdlib-only; server needs only `aiohttp` |
 | **Token-inefficient for LLMs** — verbose API calls waste context window tokens | SDK-based tools | Concise commands: `bctl text h1` vs pages of boilerplate |
+| **Broken clicks on SPAs** — programmatic clicks get blocked by popup blockers | Puppeteer, Playwright | Intercepts `window.open()` and navigates via `chrome.tabs` — SPA-compatible |
 <br>
@@ -218,6 +219,10 @@ All `<sel>` arguments accept CSS selectors **or** element refs from `snapshot` (
 | `bctl ping` | Check server & extension status |
 | `bctl serve` | Start server in foreground |
 | `bctl stop` | Stop server |
+| `bctl setup` | Install extension to `~/.browser-ctl/extension/` + open Chrome extensions page |
+| `bctl setup cursor` | Install AI skill (`SKILL.md`) into Cursor IDE |
+| `bctl setup opencode` | Install AI skill into OpenCode |
+| `bctl setup <path>` | Install AI skill to a custom directory |
 <br>
@@ -379,9 +384,10 @@ Non-zero exit code on errors — works naturally with `set -e` and `&&` chains.
 | Component | Details |
 |-----------|---------|
-| **CLI** | Stdlib only, communicates via HTTP |
+| **CLI** | Stdlib only, raw-socket HTTP (zero heavy imports, ~5ms cold start) |
 | **Bridge Server** | Async relay (aiohttp), auto-daemonizes |
 | **Extension** | MV3 service worker, auto-reconnects via `chrome.alarms` |
+| **Click** | Three-phase: pointer events → MAIN-world click → `window.open()` interception for SPA compatibility |
 | **Eval** | Dual strategy: MAIN-world injection (fast) + CDP fallback (CSP-safe) |
 <br>

{browser_ctl-0.2.4 → browser_ctl-0.2.6}/pyproject.toml RENAMED Viewed

@@ -1,6 +1,6 @@
 [project]
 name = "browser-ctl"
-version = "0.2.4"
+version = "0.2.6"
 description = "Control your browser from the command line via a Chrome extension + WebSocket bridge"
 readme = "README.md"
 license = "MIT"

browser_ctl-0.2.4/browser_ctl/SKILL.md DELETED Viewed

@@ -1,238 +0,0 @@
-# browser-ctl
-CLI tool for browser automation. Control Chrome from the terminal via `bctl` commands.
-All commands communicate through a Chrome extension + WebSocket bridge and return JSON.
-## When to Use
-Use browser-ctl when you need to:
-- Navigate web pages, click elements, type text, press keys
-- Snapshot interactive elements and operate them by ref (e0, e1, …)
-- Query the DOM: get text, HTML, attributes, values, or count elements
-- Take screenshots or download files (preserves browser auth/cookies)
-- Execute arbitrary JavaScript in the page context
-- Manage browser tabs (list, switch, open, close)
-- Automate browser workflows for testing or data extraction
-## Prerequisites
-- Chrome with the Browser-Ctl extension loaded
-- Bridge server (auto-starts with any `bctl` command)
-## Commands
-### Navigation
-```
-bctl navigate <url>       Navigate to URL (aliases: nav, go; auto-prepends https://)
-bctl back                 Go back
-bctl forward              Go forward (alias: fwd)
-bctl reload               Reload page
-```
-### Interaction
-All `<sel>` arguments accept CSS selectors or element refs (e.g. `e5` from `snapshot`).
-```
-bctl click <sel> [-i N] [-t text]    Click element; -t filters by visible text (substring)
-bctl dblclick <sel> [-i N] [-t text] Double-click element
-bctl hover <sel> [-i N] [-t text]    Hover over element
-bctl focus <sel> [-i N] [-t text]    Focus element
-bctl type <sel> <text>               Type text (replaces existing; React-compatible)
-bctl input-text <sel> <text> [--clear] [--delay ms]  Char-by-char typing (rich text editors)
-bctl press <key>                     Press key — Enter submits forms, Escape closes dialogs
-bctl check <sel> [-i N] [-t text]    Check checkbox/radio
-bctl uncheck <sel> [-i N] [-t text]  Uncheck checkbox
-bctl scroll <dir|sel> [n]            Scroll page: up/down/top/bottom or element into view
-bctl select-option <sel> <val> [--text]  Select <select> dropdown option (alias: sopt)
-bctl drag <src> [target]             Drag element to target [--dx N --dy N for offset]
-```
-### Query
-```
-bctl snapshot [--all]     List interactive elements with refs e0, e1, … (alias: snap)
-bctl text [sel]           Get text content (default: body)
-bctl html [sel]           Get innerHTML
-bctl attr <sel> [name]    Get attribute(s) [-i N for Nth element]
-bctl select <sel> [-l N]  List matching elements (alias: sel, limit default: 20)
-bctl count <sel>          Count matching elements
-bctl status               Current page URL and title
-bctl is-visible <sel>     Check if element is visible (returns rect)
-bctl get-value <sel>      Get value of form element (input/select/textarea)
-```
-### JavaScript
-```
-bctl eval <code>          Execute JS in page context
-```
-### Tabs
-```
-bctl tabs                 List all tabs
-bctl tab <id>             Switch to tab
-bctl new-tab [url]        Open new tab
-bctl close-tab [id]       Close tab (default: active)
-```
-### Screenshot & Download
-```
-bctl screenshot [path]    Capture screenshot (alias: ss)
-bctl download <target>    Download file/image (alias: dl) [-o path] [-i N]
-bctl upload <sel> <files> Upload file(s) to <input type="file">
-```
-### Wait
-```
-bctl wait <sel|seconds>   Wait for element or sleep [timeout]
-```
-### Dialog
-```
-bctl dialog [accept|dismiss] [--text <val>]  Handle next alert/confirm/prompt
-```
-### Batch / Pipe
-```
-bctl pipe                 Read commands from stdin (one per line, JSONL output)
-bctl batch '<c1>' '<c2>'  Execute multiple commands in one call
-```
-### Server
-```
-bctl ping                 Check server and extension status
-bctl serve                Start server (foreground)
-bctl stop                 Stop server
-```
-## Output Format
-All commands return JSON:
-- Success: `{"success": true, "data": {...}}`
-- Error: `{"success": false, "error": "..."}`
-## Tips & Best Practices
-### Snapshot-first Workflow (recommended for AI agents)
-- **Use `bctl snapshot` to get a numbered list of interactive elements**, then operate
-  by ref (e.g. `bctl click e5`). This eliminates guessing CSS selectors.
-- Refs are assigned as `data-bctl-ref` attributes and persist until the next snapshot.
-- Example:
-  ```bash
-  bctl snapshot                    # List all interactive elements
-  bctl click e3                    # Click the 3rd interactive element
-  bctl type e7 "hello world"      # Type into the 7th element
-  bctl input-text e7 "hello" --clear --delay 20  # Char-by-char for rich editors
-  ```
-### Data Extraction
-- **Prefer `bctl select` over `bctl eval`** for extracting structured DOM data — it's
-  more reliable across all sites, returns text/href/id/class/aria-label automatically,
-  and doesn't require complex JS strings.
-- Use `bctl text <sel>` for simple text extraction and `bctl attr <sel> [name]` for
-  specific attributes. Chain with `-i N` for Nth element.
-- Reserve `bctl eval` for cases that truly need complex JS logic (e.g. mapping/filtering,
-  accessing page-defined variables, or computing derived values).
-### Search & Scrape Workflow
-A typical pattern for searching a site and extracting results:
-```bash
-bctl go "https://site.com/search?q=keyword"      # Navigate
-bctl wait ".results" 10                           # Wait for results
-bctl select ".result-item a" -l 10                # Extract links
-bctl attr ".result-item a" href -i 0              # Get specific attribute
-```
-### Waiting Strategy
-- Always `bctl wait <selector> [timeout]` or `bctl wait <seconds>` after navigation
-  before querying — SPAs like YouTube take time to render content.
-- Prefer waiting for a specific element over a fixed delay when possible.
-### Clicking by Text (SPA-friendly)
-- Use `--text` (`-t`) to filter elements by visible text — ideal for SPAs (React,
-  Vue, etc.) where CSS class names are dynamically generated and unreliable.
-- Example: `bctl click "button" -t "Submit"` clicks the first `<button>` whose
-  visible text contains "Submit" (case-insensitive substring match).
-- This avoids fragile selectors like `button.css-1a2b3c4` and eliminates the need
-  for `bctl eval 'document.querySelector(...).click()'` workarounds.
-### Batch / Pipe (prefer for multi-step workflows)
-- **Always use `bctl pipe` when performing 2+ consecutive commands** on the same
-  page. Consecutive DOM operations (click, type, scroll, wait…) are automatically
-  merged into a single browser call, reducing overhead by ~90%.
-- Pipe reads from stdin, one command per line (`#` comments and blank lines OK).
-  Each line is a normal bctl command without the `bctl` prefix.
-- Output is JSONL — one JSON object per command.
-- Example (fill a form in one shot):
-  ```
-  bctl pipe <<'EOF'
-  type "#email" "user@example.com"
-  type "#password" "secret"
-  click "button[type=submit]"
-  EOF
-  ```
-### Shell Quoting
-- Wrap CSS selectors in double quotes: `bctl click "button.submit"`
-- For `bctl eval`, use double quotes for the outer string and single quotes inside:
-  `bctl eval "document.querySelector('h1').textContent"`
-## Examples
-```bash
-# Navigate and inspect
-bctl go https://example.com
-bctl status
-bctl text h1
-# Snapshot workflow (recommended)
-bctl snapshot                       # See all interactive elements as e0, e1, …
-bctl click e3                       # Click element by ref
-bctl type e5 "hello"                # Type into element by ref
-bctl get-value e5                   # Read form value
-bctl is-visible e3                  # Check visibility
-# Click by selector or by text
-bctl click "button.login"
-bctl click "button" -t "Sign in"           # click button containing "Sign in"
-bctl dblclick "td.cell"                    # double-click
-bctl type "input[name=q]" "search query"
-bctl press Enter
-# Character-by-character input (rich text editors, contenteditable)
-bctl input-text "div[contenteditable]" "hello" --clear --delay 20
-# Checkbox / radio
-bctl check "input#agree"
-bctl uncheck "input#newsletter"
-# Scroll a long page
-bctl scroll down              # Scroll down ~80% viewport
-bctl scroll down 500          # Scroll down 500px
-bctl scroll up                # Scroll up
-bctl scroll top               # Scroll to top
-bctl scroll bottom            # Scroll to bottom
-bctl scroll "#section-3"      # Scroll element into view
-# Form interaction
-bctl select-option "select#country" "US"           # Select by value
-bctl select-option "select#lang" "English" --text  # Select by visible text
-bctl upload "input[type=file]" ./photo.jpg          # Upload file
-# Handle dialogs (call BEFORE triggering action)
-bctl dialog accept               # Auto-accept next alert/confirm
-bctl dialog dismiss              # Dismiss next confirm
-bctl dialog accept --text "yes"  # Answer next prompt with "yes"
-# Drag and drop
-bctl drag ".card-1" ".column-done"          # Drag to target element
-bctl drag ".slider-handle" --dx 100 --dy 0  # Drag by pixel offset
-# Wait then screenshot
-bctl wait ".loaded" 10
-bctl ss page.png
-# Download with browser auth
-bctl download "https://site.com/file.pdf" -o file.pdf
-# Extract structured data (prefer select over eval)
-bctl select "a.video-link" -l 10
-bctl eval "JSON.stringify(Array.from(document.querySelectorAll('a')).slice(0,5).map(a=>({text:a.textContent.trim(),href:a.href})))"
-```