npm - hyper-agent-browser - Versions diffs - 0.3.1 → 0.4.0 - Mend

hyper-agent-browser 0.3.1 → 0.4.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (7) hide show

package/README.md +179 -263
package/package.json +1 -1
package/src/browser/manager.ts +50 -2
package/src/cli.ts +37 -0
package/src/commands/actions.ts +36 -1
package/src/commands/download.ts +202 -0
package/src/daemon/server.ts +21 -0

package/README.md CHANGED Viewed

@@ -1,64 +1,66 @@
 # hyper-agent-browser (hab)
-**纯浏览器自动化 CLI，专为 AI Agent 设计**
+**Pure Browser Automation CLI for AI Agents**
 [![npm version](https://img.shields.io/npm/v/hyper-agent-browser.svg)](https://www.npmjs.com/package/hyper-agent-browser)
 [![TypeScript](https://img.shields.io/badge/TypeScript-strict-blue.svg)](https://www.typescriptlang.org/)
 [![Bun](https://img.shields.io/badge/Bun-%3E%3D1.1.0-orange.svg)](https://bun.sh)
 [![License](https://img.shields.io/badge/license-MIT-green.svg)](./LICENSE)
-## ✨ 特性
+> 📖 [中文文档 (Chinese Documentation)](./docs/README_CN.md)
-- 🎯 **@eN 元素引用** - 无需手写选择器，自动生成 `@e1`, `@e2` 等引用
-- 🔐 **Session 持久化** - 保持登录状态，支持多账号隔离
-- 🎭 **反检测** - 基于 Patchright，绕过自动化检测
-- ⚡ **快速启动** - Bun 运行时，冷启动 ~25ms
-- 🤖 **AI Agent 友好** - 设计用于 Claude Code 等 AI Agent 调用
-- 🔒 **安全加固** - 沙箱隔离、权限控制、Session 保护
-- 📊 **数据提取** - 表格/列表/表单/元数据自动提取
-- 🌐 **网络监听** - 拦截 XHR/Fetch 请求，直接获取 API 数据
-- ⏳ **智能等待** - 网络空闲 + DOM 稳定双重策略
+## ✨ Features
-## 🚀 快速开始
+- 🎯 **@eN Element References** - No manual selectors needed, auto-generates `@e1`, `@e2` references
+- 🔐 **Session Persistence** - Maintains login state, supports multi-account isolation
+- 🎭 **Anti-Detection** - Built on Patchright, bypasses automation detection
+- ⚡ **Fast Startup** - Bun runtime, cold start ~25ms
+- 🤖 **AI Agent Friendly** - Designed for Claude Code and other AI agents
+- 🔒 **Security Hardened** - Sandbox isolation, permission control, session protection
+- 📊 **Data Extraction** - Auto-extract tables/lists/forms/metadata
+- 🌐 **Network Monitoring** - Intercept XHR/Fetch requests, get API data directly
+- ⏳ **Smart Waiting** - Network idle + DOM stable dual strategy
-### 安装
+## 🚀 Quick Start
-**使用 npm（推荐）**
+### Installation
+**Using npm (Recommended)**
 ```bash
-# 全局安装
+# Global install
 npm install -g hyper-agent-browser
-# 或使用 Bun
+# Or use Bun
 bun install -g hyper-agent-browser
-# 或使用 npx（无需安装）
+# Or use npx (no install needed)
 npx hyper-agent-browser --version
 ```
-**从源码安装**
+**From Source**
 ```bash
-git clone https://github.com/hubo1989/hyper-agent-browser.git
+git clone https://github.com/anthropics/hyper-agent-browser.git
 cd hyper-agent-browser
 bun install
-bun run build  # 构建二进制文件到 dist/hab
+bun run build  # Build binary to dist/hab
 ```
-**下载预编译二进制文件**
+**Download Pre-built Binary**
-访问 [GitHub Releases](https://github.com/hubo1989/hyper-agent-browser/releases) 下载对应平台的二进制文件。
+Visit [GitHub Releases](https://github.com/anthropics/hyper-agent-browser/releases) to download binaries for your platform.
-### 基础使用
+### Basic Usage
 ```bash
-# 1. 打开网页（有头模式，可以看到浏览器）
+# 1. Open a webpage (headed mode to see browser)
 hab --headed open https://google.com
-# 2. 获取可交互元素快照
+# 2. Get interactive elements snapshot
 hab snapshot -i
-# 输出示例:
+# Output example:
 # URL: https://google.com
 # Title: Google
 #
@@ -69,369 +71,283 @@ hab snapshot -i
 # @e4  [link]      "Gmail"
 # @e5  [link]      "Images"
-# 3. 使用 @eN 引用操作元素
+# 3. Use @eN references to interact
 hab fill @e1 "Bun JavaScript runtime"
 hab press Enter
-# 4. 等待页面加载
+# 4. Wait for page load
 hab wait 2000
-# 5. 截图
+# 5. Take screenshot
 hab screenshot -o result.png
-# 6. 获取页面内容
-hab content
 ```
-### Session 管理（多账号隔离）
+### Session Management (Multi-Account Isolation)
 ```bash
-# 个人 Gmail 账号
+# Personal Gmail account
 hab -s personal-gmail open https://mail.google.com
 hab -s personal-gmail snapshot -i
-# 工作 Gmail 账号
+# Work Gmail account
 hab -s work-gmail open https://mail.google.com
 hab -s work-gmail snapshot -i
-# 列出所有 Session
+# List all sessions
 hab sessions
-# 关闭特定 Session
+# Close specific session
 hab close -s personal-gmail
 ```
-### 数据提取（新增）
+### Data Extraction
 ```bash
-# 提取表格数据
+# Extract table data
 hab open https://example.com/users
 hab extract-table > users.json
-# 提取列表数据（自动检测商品/文章列表）
+# Extract list data (auto-detect product/article lists)
 hab extract-list --selector ".product-list" > products.json
-# 提取表单状态
+# Extract form state
 hab extract-form > form_data.json
-# 提取页面元数据（SEO/OG/Schema.org）
+# Extract page metadata (SEO/OG/Schema.org)
 hab extract-meta --include seo,og > metadata.json
 ```
-### 网络监听（新增）
+### Network Monitoring
 ```bash
-# 启动网络监听
+# Start network listener
 LISTENER_ID=$(hab network-start --filter xhr,fetch --url-pattern "*/api/*" | jq -r '.listenerId')
-# 执行操作（翻页/点击等）
+# Perform actions (pagination/clicks)
 hab click @e5
 hab wait-idle
-# 停止监听并获取所有 API 数据
+# Stop listener and get all API data
 hab network-stop $LISTENER_ID > api_data.json
 ```
-### 智能等待（新增）
+### Smart Waiting
 ```bash
-# 等待页面完全空闲（网络 + DOM）
+# Wait for page fully idle (network + DOM)
 hab wait-idle --timeout 30000
-# 等待元素可见
+# Wait for element visible
 hab wait-element "css=.data-row" --state visible
-# 等待加载动画消失
+# Wait for loading animation to disappear
 hab wait-element "css=.loading" --state detached
 ```
-## 📖 完整命令列表
+## 📖 Command Reference
-### 导航命令
+### Navigation Commands
-| 命令 | 说明 | 示例 |
-|------|------|------|
-| `open <url>` | 打开网页 | `hab open https://example.com` |
-| `reload` | 刷新当前页面 | `hab reload` |
-| `back` | 后退 | `hab back` |
-| `forward` | 前进 | `hab forward` |
+| Command | Description | Example |
+|---------|-------------|---------|
+| `open <url>` | Open webpage | `hab open https://example.com` |
+| `reload` | Refresh current page | `hab reload` |
+| `back` | Go back | `hab back` |
+| `forward` | Go forward | `hab forward` |
-### 操作命令
+### Action Commands
-| 命令 | 说明 | 示例 |
-|------|------|------|
-| `click <selector>` | 点击元素 | `hab click @e1` |
-| `fill <selector> <value>` | 填充输入框 | `hab fill @e1 "hello"` |
-| `type <text>` | 逐字输入文本 | `hab type "password"` |
-| `press <key>` | 按键 | `hab press Enter` |
-| `scroll <direction> [amount]` | 滚动页面 | `hab scroll down 500` |
-| `hover <selector>` | 悬停在元素上 | `hab hover @e3` |
-| `select <selector> <value>` | 选择下拉选项 | `hab select @e2 "Option 1"` |
-| `wait <ms\|condition>` | 等待时间或条件 | `hab wait 3000` |
+| Command | Description | Example |
+|---------|-------------|---------|
+| `click <selector>` | Click element | `hab click @e1` |
+| `fill <selector> <value>` | Fill input field | `hab fill @e1 "hello"` |
+| `type <text>` | Type text character by character | `hab type "password"` |
+| `press <key>` | Press key | `hab press Enter` |
+| `scroll <direction> [amount]` | Scroll page | `hab scroll down 500` |
+| `hover <selector>` | Hover over element | `hab hover @e3` |
+| `select <selector> <value>` | Select dropdown option | `hab select @e2 "Option 1"` |
+| `wait <ms\|condition>` | Wait for time or condition | `hab wait 3000` |
-### 信息命令
+### Info Commands
-| 命令 | 说明 | 示例 |
-|------|------|------|
-| `snapshot [-i\|--interactive]` | 获取页面快照 | `hab snapshot -i` |
-| `screenshot [-o <file>] [--full-page]` | 截图 | `hab screenshot -o page.png` |
-| `url` | 获取当前 URL | `hab url` |
-| `title` | 获取页面标题 | `hab title` |
-| `content [selector]` | 获取文本内容 | `hab content` |
-| `evaluate <script>` | 执行 JavaScript | `hab evaluate "document.title"` |
+| Command | Description | Example |
+|---------|-------------|---------|
+| `snapshot [-i\|--interactive]` | Get page snapshot | `hab snapshot -i` |
+| `screenshot [-o <file>] [--full-page]` | Take screenshot | `hab screenshot -o page.png` |
+| `url` | Get current URL | `hab url` |
+| `title` | Get page title | `hab title` |
+| `evaluate <script>` | Execute JavaScript | `hab evaluate "document.title"` |
-### Session 命令
+### Session Commands
-| 命令 | 说明 | 示例 |
-|------|------|------|
-| `sessions` | 列出所有 Session | `hab sessions` |
-| `close [-s <name>]` | 关闭 Session | `hab close -s gmail` |
+| Command | Description | Example |
+|---------|-------------|---------|
+| `sessions` | List all sessions | `hab sessions` |
+| `close [-s <name>]` | Close session | `hab close -s gmail` |
-### 全局选项
+### Global Options
-| 选项 | 说明 | 默认值 |
-|------|------|--------|
-| `-s, --session <name>` | 指定 Session 名称 | `default` |
-| `--headed` | 有头模式（显示浏览器） | `false` |
-| `--channel <chrome\|msedge>` | 浏览器类型 | `chrome` |
-| `--timeout <ms>` | 超时时间 | `30000` |
+| Option | Description | Default |
+|--------|-------------|---------|
+| `-s, --session <name>` | Session name | `default` |
+| `--headed` | Headed mode (show browser) | `false` |
+| `--channel <chrome\|msedge>` | Browser type | `chrome` |
+| `--timeout <ms>` | Timeout | `30000` |
-## 🤖 AI Agent 集成（Claude Code）
+## 🤖 AI Agent Integration (Claude Code)
-hyper-agent-browser 专为 AI Agent 设计，可与 Claude Code 无缝集成。
+hyper-agent-browser is designed for AI agents and integrates seamlessly with Claude Code.
-### 安装 Skill 文件
+### Install Skill File
 ```bash
-# 方法 1：从本地仓库复制
+# Method 1: Copy from local repo
 mkdir -p ~/.claude/skills/hyper-agent-browser
-cp skills/hyper-browser.md ~/.claude/skills/hyper-agent-browser/skill.md
+cp skills/hyper-agent-browser.md ~/.claude/skills/hyper-agent-browser/skill.md
-# 方法 2：直接下载
+# Method 2: Direct download
 mkdir -p ~/.claude/skills/hyper-agent-browser
 curl -o ~/.claude/skills/hyper-agent-browser/skill.md \
-  https://raw.githubusercontent.com/hubo1989/hyper-agent-browser/main/skills/hyper-browser.md
+  https://raw.githubusercontent.com/anthropics/hyper-agent-browser/main/skills/hyper-agent-browser.md
 ```
-### 使用示例
+### Usage Examples
-安装 Skill 后，Claude Code 会自动识别并使用 `hab` 命令。你可以这样指示 Claude：
+After installing the skill, Claude Code will automatically recognize and use `hab` commands:
 ```
-"帮我打开 Google 搜索 'Bun runtime' 并截图"
-"登录我的 Gmail 账号，找到未读邮件数量"
-"访问 Twitter 并获取首页的所有推文标题"
+"Help me open Google, search for 'Bun runtime' and take a screenshot"
+"Log into my Gmail account and find the number of unread emails"
+"Visit Twitter and get all tweet titles from the homepage"
 ```
-Claude 会自动：
-1. 使用 `hab open` 打开网页
-2. 使用 `hab snapshot -i` 获取元素引用
-3. 分析快照，找到目标元素（如 `@e5`）
-4. 使用 `hab click @e5` 等命令完成操作
-### Skill 功能
-- ✅ 自动解析 `@eN` 引用
-- ✅ Session 管理（多账号隔离）
-- ✅ 错误处理和重试
-- ✅ 浏览器状态保持
-- ✅ 登录状态持久化
-## 📋 选择器格式
-hyper-agent-browser 支持多种选择器格式：
-| 格式 | 示例 | 说明 | 推荐度 |
-|------|------|------|--------|
-| `@eN` | `@e1`, `@e5` | 元素引用（来自 snapshot） | ⭐⭐⭐⭐⭐ |
-| `css=` | `css=#login` | CSS 选择器 | ⭐⭐⭐ |
-| `text=` | `text=Sign in` | 文本匹配 | ⭐⭐⭐⭐ |
-| `xpath=` | `xpath=//button` | XPath 选择器 | ⭐⭐ |
-**推荐使用 `@eN` 引用**：
-- 无需手写选择器
-- 自动处理动态 ID/Class
-- AI Agent 友好
-## 🎯 核心功能详解
-### 1. 元素引用系统
-不需要手写复杂的选择器：
-```bash
-# 传统方式（繁琐、易出错）
-hab click 'css=button.MuiButton-root.MuiButton-contained.MuiButton-sizeMedium'
-# hyper-agent-browser 方式（简单、可靠）
-hab snapshot -i  # 自动生成 @e1, @e2... 引用
-hab click @e5    # 直接使用引用
-```
-### 2. Session 持久化
-每个 Session 有独立的：
-- 浏览器实例
-- UserData 目录（Cookies/LocalStorage）
-- 登录状态
-- 浏览历史
-```
-~/.hab/sessions/
-├── default/
-│   ├── userdata/      # Chrome UserData
-│   ├── session.json   # 元数据（wsEndpoint/pid/url）
-│   └── element-refs.json  # @eN 映射
-├── gmail-personal/
-└── gmail-work/
-```
-### 3. 浏览器复用
-CLI 每次调用是独立进程，但浏览器实例会持久化复用：
-```bash
-# 第一次：启动新浏览器 (~1-2s)
-hab --headed open https://google.com
-# 后续调用：复用浏览器 (~50ms)
-hab snapshot -i
-hab click @e1
-```
-## 🔒 安全特性
-hyper-agent-browser v0.1.0 包含全面的安全加固：
-### 1. evaluate 命令沙箱
-- ✅ 白名单模式（仅允许安全的 document/window 操作）
-- ✅ 增强黑名单（阻止 eval/Function/constructor/globalThis）
-- ✅ 结果大小限制（最大 100KB，防止数据窃取）
-### 2. Session 文件权限保护
-- ✅ session.json 权限设置为 `0o600`（仅所有者可读写）
-- ✅ 保护 wsEndpoint 不被其他进程劫持
-### 3. 配置文件权限保护
-- ✅ config.json 权限设置为 `0o600`
-- ✅ 保护敏感配置
-### 4. Chrome 扩展安全验证
+Claude will automatically:
+1. Use `hab open` to open the webpage
+2. Use `hab snapshot -i` to get element references
+3. Analyze the snapshot to find target elements (e.g., `@e5`)
+4. Use `hab click @e5` and other commands to complete the task
-- ✅ 扩展白名单机制
-- ✅ 自动检查扩展 manifest 危险权限
-- ✅ 过滤含 debugger/webRequest/proxy 权限的扩展
+## 📋 Selector Format
-### 5. 系统 Keychain 隔离
+| Format | Example | Description | Recommended |
+|--------|---------|-------------|-------------|
+| `@eN` | `@e1`, `@e5` | Element reference (from snapshot) | ⭐⭐⭐⭐⭐ |
+| `css=` | `css=#login` | CSS selector | ⭐⭐⭐ |
+| `text=` | `text=Sign in` | Text match | ⭐⭐⭐⭐ |
+| `xpath=` | `xpath=//button` | XPath selector | ⭐⭐ |
-- ✅ 默认使用隔离的密码存储
-- ✅ 通过 `HAB_USE_SYSTEM_KEYCHAIN=true` 显式启用
+**Recommended: Use `@eN` references**:
+- No manual selector writing
+- Auto-handles dynamic IDs/Classes
+- AI Agent friendly
-### 6. 配置键白名单验证
+## 🔒 Security Features
-- ✅ 仅允许修改安全的配置键
-- ✅ 阻止危险浏览器参数注入
+- ✅ **evaluate Sandbox** - Whitelist mode, blocks dangerous operations
+- ✅ **Session File Protection** - Permissions set to `0o600`
+- ✅ **Chrome Extension Verification** - Whitelist + dangerous permission filtering
+- ✅ **System Keychain Isolation** - Isolated password storage by default
+- ✅ **Config Key Whitelist** - Prevents dangerous browser argument injection
-## 🏗️ 架构
+## 🏗️ Architecture
 ```
 src/
-├── cli.ts              # CLI 入口（Commander.js）
+├── cli.ts              # CLI entry (Commander.js)
 ├── browser/
-│   ├── manager.ts      # 浏览器生命周期管理
-│   └── context.ts      # BrowserContext 封装
+│   └── manager.ts      # Browser lifecycle management
+├── daemon/
+│   ├── server.ts       # Daemon server
+│   ├── client.ts       # Daemon client
+│   └── browser-pool.ts # Browser instance pool
 ├── session/
-│   ├── manager.ts      # Session 管理（多浏览器实例）
-│   └── store.ts        # UserData 持久化
+│   ├── manager.ts      # Session management
+│   └── store.ts        # UserData persistence
 ├── commands/
 │   ├── navigation.ts   # open/reload/back/forward
 │   ├── actions.ts      # click/fill/type/press/scroll
 │   ├── info.ts         # snapshot/screenshot/evaluate
-│   └── session.ts      # sessions/close
+│   ├── extract.ts      # Data extraction commands
+│   └── network.ts      # Network monitoring
 ├── snapshot/
-│   ├── accessibility.ts    # 从 Accessibility Tree 提取元素
-│   ├── dom-extractor.ts    # DOM 提取器（fallback）
-│   ├── formatter.ts        # 格式化输出
-│   └── reference-store.ts  # @eN 映射存储
+│   ├── accessibility.ts    # Extract from Accessibility Tree
+│   ├── dom-extractor.ts    # DOM extractor (fallback)
+│   └── reference-store.ts  # @eN mapping storage
 └── utils/
-    ├── selector.ts     # 选择器解析
-    ├── config.ts       # 配置管理
-    ├── errors.ts       # 错误处理
-    └── logger.ts       # 日志
+    ├── selector.ts     # Selector parsing
+    ├── config.ts       # Config management
+    └── errors.ts       # Error handling
 ```
-## 📊 技术栈
+## 📊 Tech Stack
-- **Bun** 1.2.21 - JavaScript 运行时
-- **Patchright** 1.57.0 - 反检测 Playwright fork
-- **Commander.js** 12.1.0 - CLI 框架
-- **Zod** 3.25.76 - 数据验证
-- **Biome** 1.9.4 - 代码规范
+- **Bun** 1.2.21 - JavaScript runtime
+- **Patchright** 1.57.0 - Anti-detection Playwright fork
+- **Commander.js** 12.1.0 - CLI framework
+- **Zod** 3.25.76 - Data validation
+- **Biome** 1.9.4 - Code linting
-## 🛠️ 开发
+## 🛠️ Development
 ```bash
-# 克隆仓库
-git clone https://github.com/hubo1989/hyper-agent-browser.git
+# Clone repo
+git clone https://github.com/anthropics/hyper-agent-browser.git
 cd hyper-agent-browser
-# 安装依赖
+# Install dependencies
 bun install
-# 开发模式运行
+# Development mode
 bun dev -- --headed open https://google.com
-# 运行测试
+# Run tests
 bun test
-# 类型检查
+# Type check
 bun run typecheck
-# 代码规范检查
+# Lint
 bun run lint
-# 构建
-bun run build       # 当前平台
-bun run build:all   # 所有平台
+# Build
+bun run build       # Current platform
+bun run build:all   # All platforms
 ```
-## 📚 文档
+## 📚 Documentation
-- [GETTING_STARTED.md](./GETTING_STARTED.md) - 快速入门指南
-- [ELEMENT_REFERENCE_GUIDE.md](./ELEMENT_REFERENCE_GUIDE.md) - @eN 引用完整文档
-- [GOOGLE_PROFILE_GUIDE.md](./GOOGLE_PROFILE_GUIDE.md) - Google Profile 集成
-- [CLAUDE.md](./CLAUDE.md) - 开发者文档
-- [hyper-agent-browser-spec.md](./hyper-agent-browser-spec.md) - 技术规格
-- [Skill 文档](./skills/hyper-browser.md) - Claude Code Skill 说明
+- [Quick Start Guide](./GETTING_STARTED.md)
+- [Element Reference Guide](./ELEMENT_REFERENCE_GUIDE.md)
+- [Google Profile Integration](./GOOGLE_PROFILE_GUIDE.md)
+- [Developer Docs](./CLAUDE.md)
+- [Technical Spec](./hyper-agent-browser-spec.md)
+- [Skill Documentation](./skills/hyper-agent-browser.md)
+- [中文文档 (Chinese)](./docs/README_CN.md)
-## 🤝 贡献
+## 🤝 Contributing
-欢迎 Pull Requests！请确保：
+Pull Requests welcome! Please ensure:
-- ✅ TypeScript 类型检查通过：`bun run typecheck`
-- ✅ 测试通过：`bun test`
-- ✅ 代码规范检查通过：`bun run lint`
+- ✅ TypeScript type check passes: `bun run typecheck`
+- ✅ Tests pass: `bun test`
+- ✅ Lint passes: `bun run lint`
-## 📄 许可证
+## 📄 License
 [MIT](./LICENSE)
-## 🔗 相关链接
+## 🔗 Links
-- **npm 包**: https://www.npmjs.com/package/hyper-agent-browser
-- **GitHub**: https://github.com/hubo1989/hyper-agent-browser
-- **Issues**: https://github.com/hubo1989/hyper-agent-browser/issues
-- **Releases**: https://github.com/hubo1989/hyper-agent-browser/releases
+- **npm**: https://www.npmjs.com/package/hyper-agent-browser
+- **GitHub**: https://github.com/anthropics/hyper-agent-browser
+- **Issues**: https://github.com/anthropics/hyper-agent-browser/issues
+- **Releases**: https://github.com/anthropics/hyper-agent-browser/releases
-## 🙏 致谢
+## 🙏 Acknowledgments
-- [Patchright](https://github.com/Patchright/patchright) - 反检测 Playwright fork
-- [agent-browser](https://github.com/anthropics/agent-browser) - CLI 设计灵感
-- [Bun](https://bun.sh) - 快速的 JavaScript 运行时
-- [Claude Code](https://claude.ai/code) - AI 编程助手
+- [Patchright](https://github.com/Patchright/patchright) - Anti-detection Playwright fork
+- [Bun](https://bun.sh) - Fast JavaScript runtime
+- [Claude Code](https://claude.ai/code) - AI programming assistant
 ---

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "hyper-agent-browser",
-  "version": "0.3.1",
+  "version": "0.4.0",
   "description": "Pure browser automation CLI for AI Agents - 纯浏览器自动化 CLI，专为 AI Agent 设计",
   "type": "module",
   "main": "src/cli.ts",

package/src/browser/manager.ts CHANGED Viewed

@@ -149,14 +149,23 @@ export class BrowserManager {
       }
       // Use launchPersistentContext for UserData persistence
-      this.context = await chromium.launchPersistentContext(this.session.userDataDir, {
+      // 给启动加上超时保护（15秒）
+      const launchPromise = chromium.launchPersistentContext(this.session.userDataDir, {
         channel: this.options.channel,
         headless: !this.options.headed,
         args: launchArgs,
         ignoreDefaultArgs: ignoreArgs,
         viewport: { width: 1280, height: 720 },
+        timeout: 15000, // 15秒启动超时
       });
+      this.context = await Promise.race([
+        launchPromise,
+        new Promise<never>((_, reject) =>
+          setTimeout(() => reject(new Error("Browser launch timeout (15s)")), 15000),
+        ),
+      ]);
       // Extract browser from context
       // @ts-ignore - context has _browser property
       this.browser = this.context._browser;
@@ -334,6 +343,22 @@ export class BrowserManager {
       // 重新连接
       await this.connect();
     }
+    // 确保返回当前活动页面（可能有多个页面时需要获取最新的）
+    if (this.context) {
+      const pages = this.context.pages();
+      if (pages.length > 0) {
+        // 优先返回非 about:blank 的页面
+        const activePage = pages.find((p) => p.url() !== "about:blank") || pages[pages.length - 1];
+        if (activePage !== this.page) {
+          this.page = activePage;
+          if (this.options.timeout) {
+            this.page.setDefaultTimeout(this.options.timeout);
+          }
+        }
+      }
+    }
     return this.page!;
   }
@@ -353,7 +378,30 @@ export class BrowserManager {
   async close(): Promise<void> {
     if (this.browser) {
-      await this.browser.close();
+      // 获取 PID 以便超时后强制 kill
+      const pid = this.getPid();
+      try {
+        // 给 close 操作 5 秒超时
+        await Promise.race([
+          this.browser.close(),
+          new Promise((_, reject) =>
+            setTimeout(() => reject(new Error("Browser close timeout")), 5000),
+          ),
+        ]);
+      } catch (error) {
+        console.log("Browser close failed or timed out, forcing cleanup...");
+        // 强制 kill 进程
+        if (pid) {
+          try {
+            process.kill(pid, "SIGKILL");
+            console.log(`Force killed browser process (PID: ${pid})`);
+          } catch {
+            // 进程可能已经退出
+          }
+        }
+      }
       this.browser = null;
       this.context = null;
       this.page = null;

package/src/cli.ts CHANGED Viewed

@@ -622,6 +622,43 @@ program
     console.log(result);
   });
+// Download commands
+program
+  .command("download <selector>")
+  .description("Download file by clicking element (preserves auth/cookies)")
+  .option("-o, --output <path>", "Output path or directory")
+  .option("--timeout <ms>", "Download timeout in milliseconds", "60000")
+  .action(async (selector, options, command) => {
+    const result = await executeViaDaemon(
+      "download",
+      {
+        selector,
+        output: options.output,
+        timeout: Number.parseInt(options.timeout),
+      },
+      command.parent,
+    );
+    console.log(result);
+  });
+program
+  .command("download-url <url>")
+  .description("Download file directly from URL (preserves auth/cookies)")
+  .option("-o, --output <path>", "Output path or directory")
+  .option("--timeout <ms>", "Download timeout in milliseconds", "60000")
+  .action(async (url, options, command) => {
+    const result = await executeViaDaemon(
+      "download-url",
+      {
+        url,
+        output: options.output,
+        timeout: Number.parseInt(options.timeout),
+      },
+      command.parent,
+    );
+    console.log(result);
+  });
 // Network commands
 program
   .command("network-start")

package/src/commands/actions.ts CHANGED Viewed

@@ -43,7 +43,42 @@ async function getLocator(page: Page, selector: string): Promise<Locator> {
 export async function click(page: Page, selector: string): Promise<void> {
   const locator = await getLocator(page, selector);
-  await locator.click();
+  // 先尝试正常点击
+  try {
+    await locator.click({ timeout: 5000 });
+  } catch (error) {
+    // 如果被遮罩拦截，逐级降级
+    if (error instanceof Error && error.message.includes("intercepts pointer events")) {
+      console.log("Element intercepted, trying force click...");
+      try {
+        await locator.click({ force: true, timeout: 5000 });
+      } catch {
+        // force click 失败，使用完整鼠标事件序列（兼容 React 等框架）
+        console.log("Force click failed, using mouse event sequence...");
+        await locator.evaluate((el: HTMLElement) => {
+          const rect = el.getBoundingClientRect();
+          const x = rect.left + rect.width / 2;
+          const y = rect.top + rect.height / 2;
+          // 模拟完整鼠标事件序列
+          for (const type of ["mousedown", "mouseup", "click"]) {
+            el.dispatchEvent(
+              new MouseEvent(type, {
+                bubbles: true,
+                cancelable: true,
+                view: window,
+                clientX: x,
+                clientY: y,
+              }),
+            );
+          }
+        });
+      }
+    } else {
+      throw error;
+    }
+  }
 }
 export async function fill(page: Page, selector: string, value: string): Promise<void> {

package/src/commands/download.ts ADDED Viewed

@@ -0,0 +1,202 @@
+import { existsSync, mkdirSync } from "node:fs";
+import { basename, join } from "node:path";
+import type { Page } from "patchright";
+import { parseSelector } from "../utils/selector";
+// Global reference store for element mappings
+let globalReferenceStore: any = null;
+export function setReferenceStore(store: any) {
+  globalReferenceStore = store;
+}
+async function getLocator(page: Page, selector: string) {
+  const parsed = parseSelector(selector);
+  switch (parsed.type) {
+    case "ref": {
+      if (!globalReferenceStore) {
+        throw new Error(
+          `Element reference @${parsed.value} requires a snapshot first. Run 'hab snapshot -i' to generate element references.`,
+        );
+      }
+      const actualSelector = globalReferenceStore.get(parsed.value);
+      if (!actualSelector) {
+        throw new Error(
+          `Element reference @${parsed.value} not found. Run 'hab snapshot -i' to update element references.`,
+        );
+      }
+      return page.locator(actualSelector);
+    }
+    case "css":
+      return page.locator(parsed.value);
+    case "text":
+      return page.getByText(parsed.value);
+    case "xpath":
+      return page.locator(`xpath=${parsed.value}`);
+    default:
+      throw new Error(`Unknown selector type: ${parsed.type}`);
+  }
+}
+export interface DownloadOptions {
+  output?: string;
+  timeout?: number;
+}
+export interface DownloadResult {
+  success: boolean;
+  path: string;
+  filename: string;
+  size?: number;
+}
+/**
+ * Download file by clicking an element (link/button)
+ * Uses browser's native download capability to preserve cookies/auth
+ */
+export async function download(
+  page: Page,
+  selector: string,
+  options: DownloadOptions = {},
+): Promise<DownloadResult> {
+  const timeout = options.timeout || 60000;
+  // Determine output directory
+  let outputDir: string;
+  let specifiedFilename: string | undefined;
+  if (options.output) {
+    if (options.output.endsWith("/") || !options.output.includes(".")) {
+      // It's a directory
+      outputDir = options.output;
+    } else {
+      // It's a file path
+      outputDir = join(options.output, "..");
+      specifiedFilename = basename(options.output);
+    }
+  } else {
+    // Default to current directory
+    outputDir = process.cwd();
+  }
+  // Ensure output directory exists
+  if (!existsSync(outputDir)) {
+    mkdirSync(outputDir, { recursive: true });
+  }
+  const locator = await getLocator(page, selector);
+  // Wait for download event while clicking
+  const downloadPromise = page.waitForEvent("download", { timeout });
+  await locator.click();
+  const download = await downloadPromise;
+  // Get suggested filename or use specified one
+  const filename = specifiedFilename || download.suggestedFilename();
+  const outputPath = join(outputDir, filename);
+  // Save the file
+  await download.saveAs(outputPath);
+  // Get file stats if possible
+  let size: number | undefined;
+  try {
+    const stat = await Bun.file(outputPath).stat();
+    size = stat?.size;
+  } catch {
+    // Ignore stat errors
+  }
+  return {
+    success: true,
+    path: outputPath,
+    filename,
+    size,
+  };
+}
+/**
+ * Download file directly from URL
+ * Uses Playwright's request API to preserve cookies/auth (bypasses CORS)
+ */
+export async function downloadUrl(
+  page: Page,
+  url: string,
+  options: DownloadOptions = {},
+): Promise<DownloadResult> {
+  // Determine output directory and filename
+  let outputDir: string;
+  let specifiedFilename: string | undefined;
+  if (options.output) {
+    if (options.output.endsWith("/") || !options.output.includes(".")) {
+      outputDir = options.output;
+    } else {
+      outputDir = join(options.output, "..");
+      specifiedFilename = basename(options.output);
+    }
+  } else {
+    outputDir = process.cwd();
+  }
+  // Ensure output directory exists
+  if (!existsSync(outputDir)) {
+    mkdirSync(outputDir, { recursive: true });
+  }
+  // Use Playwright's request API (shares cookies with browser context, bypasses CORS)
+  const context = page.context();
+  const response = await context.request.get(url);
+  if (!response.ok()) {
+    throw new Error(`HTTP ${response.status()}: ${response.statusText()}`);
+  }
+  // Get filename from Content-Disposition header or URL
+  let filename = "";
+  const contentDisposition = response.headers()["content-disposition"];
+  if (contentDisposition) {
+    const match = contentDisposition.match(/filename[^;=\n]*=((['"]).*?\2|[^;\n]*)/);
+    if (match) {
+      filename = match[1].replace(/['"]/g, "");
+    }
+  }
+  // Determine final filename
+  filename = specifiedFilename || filename || basename(new URL(url).pathname) || "download";
+  const outputPath = join(outputDir, filename);
+  // Get body and write to file
+  const body = await response.body();
+  await Bun.write(outputPath, body);
+  return {
+    success: true,
+    path: outputPath,
+    filename,
+    size: body.byteLength,
+  };
+}
+/**
+ * Format file size for display
+ */
+function formatSize(bytes: number): string {
+  if (bytes < 1024) return `${bytes} B`;
+  if (bytes < 1024 * 1024) return `${(bytes / 1024).toFixed(1)} KB`;
+  if (bytes < 1024 * 1024 * 1024) return `${(bytes / (1024 * 1024)).toFixed(1)} MB`;
+  return `${(bytes / (1024 * 1024 * 1024)).toFixed(1)} GB`;
+}
+/**
+ * Format download result for CLI output
+ */
+export function formatDownloadResult(result: DownloadResult): string {
+  const sizeStr = result.size ? ` (${formatSize(result.size)})` : "";
+  return `Downloaded: ${result.filename}${sizeStr}\nSaved to: ${result.path}`;
+}

package/src/daemon/server.ts CHANGED Viewed

@@ -1,5 +1,6 @@
 import * as actionCommands from "../commands/actions";
 import * as advancedCommands from "../commands/advanced";
+import * as downloadCommands from "../commands/download";
 import * as extractCommands from "../commands/extract";
 import * as getterCommands from "../commands/getters";
 import * as infoCommands from "../commands/info";
@@ -595,6 +596,26 @@ export class DaemonServer {
             result = await networkCommands.networkStop(args.listenerId);
             break;
+          // Download commands
+          case "download": {
+            downloadCommands.setReferenceStore(referenceStore);
+            const downloadResult = await downloadCommands.download(page, args.selector, {
+              output: args.output,
+              timeout: args.timeout,
+            });
+            result = downloadCommands.formatDownloadResult(downloadResult);
+            break;
+          }
+          case "download-url": {
+            const downloadUrlResult = await downloadCommands.downloadUrl(page, args.url, {
+              output: args.output,
+              timeout: args.timeout,
+            });
+            result = downloadCommands.formatDownloadResult(downloadUrlResult);
+            break;
+          }
           default:
             throw new Error(`Unknown command: ${command}`);
         }