hyper-agent-browser 0.3.1 → 0.4.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -1,64 +1,66 @@
1
1
  # hyper-agent-browser (hab)
2
2
 
3
- **纯浏览器自动化 CLI,专为 AI Agent 设计**
3
+ **Pure Browser Automation CLI for AI Agents**
4
4
 
5
5
  [![npm version](https://img.shields.io/npm/v/hyper-agent-browser.svg)](https://www.npmjs.com/package/hyper-agent-browser)
6
6
  [![TypeScript](https://img.shields.io/badge/TypeScript-strict-blue.svg)](https://www.typescriptlang.org/)
7
7
  [![Bun](https://img.shields.io/badge/Bun-%3E%3D1.1.0-orange.svg)](https://bun.sh)
8
8
  [![License](https://img.shields.io/badge/license-MIT-green.svg)](./LICENSE)
9
9
 
10
- ## 特性
10
+ > 📖 [中文文档 (Chinese Documentation)](./docs/README_CN.md)
11
11
 
12
- - 🎯 **@eN 元素引用** - 无需手写选择器,自动生成 `@e1`, `@e2` 等引用
13
- - 🔐 **Session 持久化** - 保持登录状态,支持多账号隔离
14
- - 🎭 **反检测** - 基于 Patchright,绕过自动化检测
15
- - ⚡ **快速启动** - Bun 运行时,冷启动 ~25ms
16
- - 🤖 **AI Agent 友好** - 设计用于 Claude Code 等 AI Agent 调用
17
- - 🔒 **安全加固** - 沙箱隔离、权限控制、Session 保护
18
- - 📊 **数据提取** - 表格/列表/表单/元数据自动提取
19
- - 🌐 **网络监听** - 拦截 XHR/Fetch 请求,直接获取 API 数据
20
- - ⏳ **智能等待** - 网络空闲 + DOM 稳定双重策略
12
+ ## Features
21
13
 
22
- ## 🚀 快速开始
14
+ - 🎯 **@eN Element References** - No manual selectors needed, auto-generates `@e1`, `@e2` references
15
+ - 🔐 **Session Persistence** - Maintains login state, supports multi-account isolation
16
+ - 🎭 **Anti-Detection** - Built on Patchright, bypasses automation detection
17
+ - ⚡ **Fast Startup** - Bun runtime, cold start ~25ms
18
+ - 🤖 **AI Agent Friendly** - Designed for Claude Code and other AI agents
19
+ - 🔒 **Security Hardened** - Sandbox isolation, permission control, session protection
20
+ - 📊 **Data Extraction** - Auto-extract tables/lists/forms/metadata
21
+ - 🌐 **Network Monitoring** - Intercept XHR/Fetch requests, get API data directly
22
+ - ⏳ **Smart Waiting** - Network idle + DOM stable dual strategy
23
23
 
24
- ### 安装
24
+ ## 🚀 Quick Start
25
25
 
26
- **使用 npm(推荐)**
26
+ ### Installation
27
+
28
+ **Using npm (Recommended)**
27
29
 
28
30
  ```bash
29
- # 全局安装
31
+ # Global install
30
32
  npm install -g hyper-agent-browser
31
33
 
32
- # 或使用 Bun
34
+ # Or use Bun
33
35
  bun install -g hyper-agent-browser
34
36
 
35
- # 或使用 npx(无需安装)
37
+ # Or use npx (no install needed)
36
38
  npx hyper-agent-browser --version
37
39
  ```
38
40
 
39
- **从源码安装**
41
+ **From Source**
40
42
 
41
43
  ```bash
42
- git clone https://github.com/hubo1989/hyper-agent-browser.git
44
+ git clone https://github.com/anthropics/hyper-agent-browser.git
43
45
  cd hyper-agent-browser
44
46
  bun install
45
- bun run build # 构建二进制文件到 dist/hab
47
+ bun run build # Build binary to dist/hab
46
48
  ```
47
49
 
48
- **下载预编译二进制文件**
50
+ **Download Pre-built Binary**
49
51
 
50
- 访问 [GitHub Releases](https://github.com/hubo1989/hyper-agent-browser/releases) 下载对应平台的二进制文件。
52
+ Visit [GitHub Releases](https://github.com/anthropics/hyper-agent-browser/releases) to download binaries for your platform.
51
53
 
52
- ### 基础使用
54
+ ### Basic Usage
53
55
 
54
56
  ```bash
55
- # 1. 打开网页(有头模式,可以看到浏览器)
57
+ # 1. Open a webpage (headed mode to see browser)
56
58
  hab --headed open https://google.com
57
59
 
58
- # 2. 获取可交互元素快照
60
+ # 2. Get interactive elements snapshot
59
61
  hab snapshot -i
60
62
 
61
- # 输出示例:
63
+ # Output example:
62
64
  # URL: https://google.com
63
65
  # Title: Google
64
66
  #
@@ -69,369 +71,283 @@ hab snapshot -i
69
71
  # @e4 [link] "Gmail"
70
72
  # @e5 [link] "Images"
71
73
 
72
- # 3. 使用 @eN 引用操作元素
74
+ # 3. Use @eN references to interact
73
75
  hab fill @e1 "Bun JavaScript runtime"
74
76
  hab press Enter
75
77
 
76
- # 4. 等待页面加载
78
+ # 4. Wait for page load
77
79
  hab wait 2000
78
80
 
79
- # 5. 截图
81
+ # 5. Take screenshot
80
82
  hab screenshot -o result.png
81
-
82
- # 6. 获取页面内容
83
- hab content
84
83
  ```
85
84
 
86
- ### Session 管理(多账号隔离)
85
+ ### Session Management (Multi-Account Isolation)
87
86
 
88
87
  ```bash
89
- # 个人 Gmail 账号
88
+ # Personal Gmail account
90
89
  hab -s personal-gmail open https://mail.google.com
91
90
  hab -s personal-gmail snapshot -i
92
91
 
93
- # 工作 Gmail 账号
92
+ # Work Gmail account
94
93
  hab -s work-gmail open https://mail.google.com
95
94
  hab -s work-gmail snapshot -i
96
95
 
97
- # 列出所有 Session
96
+ # List all sessions
98
97
  hab sessions
99
98
 
100
- # 关闭特定 Session
99
+ # Close specific session
101
100
  hab close -s personal-gmail
102
101
  ```
103
102
 
104
- ### 数据提取(新增)
103
+ ### Data Extraction
105
104
 
106
105
  ```bash
107
- # 提取表格数据
106
+ # Extract table data
108
107
  hab open https://example.com/users
109
108
  hab extract-table > users.json
110
109
 
111
- # 提取列表数据(自动检测商品/文章列表)
110
+ # Extract list data (auto-detect product/article lists)
112
111
  hab extract-list --selector ".product-list" > products.json
113
112
 
114
- # 提取表单状态
113
+ # Extract form state
115
114
  hab extract-form > form_data.json
116
115
 
117
- # 提取页面元数据(SEO/OG/Schema.org
116
+ # Extract page metadata (SEO/OG/Schema.org)
118
117
  hab extract-meta --include seo,og > metadata.json
119
118
  ```
120
119
 
121
- ### 网络监听(新增)
120
+ ### Network Monitoring
122
121
 
123
122
  ```bash
124
- # 启动网络监听
123
+ # Start network listener
125
124
  LISTENER_ID=$(hab network-start --filter xhr,fetch --url-pattern "*/api/*" | jq -r '.listenerId')
126
125
 
127
- # 执行操作(翻页/点击等)
126
+ # Perform actions (pagination/clicks)
128
127
  hab click @e5
129
128
  hab wait-idle
130
129
 
131
- # 停止监听并获取所有 API 数据
130
+ # Stop listener and get all API data
132
131
  hab network-stop $LISTENER_ID > api_data.json
133
132
  ```
134
133
 
135
- ### 智能等待(新增)
134
+ ### Smart Waiting
136
135
 
137
136
  ```bash
138
- # 等待页面完全空闲(网络 + DOM
137
+ # Wait for page fully idle (network + DOM)
139
138
  hab wait-idle --timeout 30000
140
139
 
141
- # 等待元素可见
140
+ # Wait for element visible
142
141
  hab wait-element "css=.data-row" --state visible
143
142
 
144
- # 等待加载动画消失
143
+ # Wait for loading animation to disappear
145
144
  hab wait-element "css=.loading" --state detached
146
145
  ```
147
146
 
148
- ## 📖 完整命令列表
147
+ ## 📖 Command Reference
149
148
 
150
- ### 导航命令
149
+ ### Navigation Commands
151
150
 
152
- | 命令 | 说明 | 示例 |
153
- |------|------|------|
154
- | `open <url>` | 打开网页 | `hab open https://example.com` |
155
- | `reload` | 刷新当前页面 | `hab reload` |
156
- | `back` | 后退 | `hab back` |
157
- | `forward` | 前进 | `hab forward` |
151
+ | Command | Description | Example |
152
+ |---------|-------------|---------|
153
+ | `open <url>` | Open webpage | `hab open https://example.com` |
154
+ | `reload` | Refresh current page | `hab reload` |
155
+ | `back` | Go back | `hab back` |
156
+ | `forward` | Go forward | `hab forward` |
158
157
 
159
- ### 操作命令
158
+ ### Action Commands
160
159
 
161
- | 命令 | 说明 | 示例 |
162
- |------|------|------|
163
- | `click <selector>` | 点击元素 | `hab click @e1` |
164
- | `fill <selector> <value>` | 填充输入框 | `hab fill @e1 "hello"` |
165
- | `type <text>` | 逐字输入文本 | `hab type "password"` |
166
- | `press <key>` | 按键 | `hab press Enter` |
167
- | `scroll <direction> [amount]` | 滚动页面 | `hab scroll down 500` |
168
- | `hover <selector>` | 悬停在元素上 | `hab hover @e3` |
169
- | `select <selector> <value>` | 选择下拉选项 | `hab select @e2 "Option 1"` |
170
- | `wait <ms\|condition>` | 等待时间或条件 | `hab wait 3000` |
160
+ | Command | Description | Example |
161
+ |---------|-------------|---------|
162
+ | `click <selector>` | Click element | `hab click @e1` |
163
+ | `fill <selector> <value>` | Fill input field | `hab fill @e1 "hello"` |
164
+ | `type <text>` | Type text character by character | `hab type "password"` |
165
+ | `press <key>` | Press key | `hab press Enter` |
166
+ | `scroll <direction> [amount]` | Scroll page | `hab scroll down 500` |
167
+ | `hover <selector>` | Hover over element | `hab hover @e3` |
168
+ | `select <selector> <value>` | Select dropdown option | `hab select @e2 "Option 1"` |
169
+ | `wait <ms\|condition>` | Wait for time or condition | `hab wait 3000` |
171
170
 
172
- ### 信息命令
171
+ ### Info Commands
173
172
 
174
- | 命令 | 说明 | 示例 |
175
- |------|------|------|
176
- | `snapshot [-i\|--interactive]` | 获取页面快照 | `hab snapshot -i` |
177
- | `screenshot [-o <file>] [--full-page]` | 截图 | `hab screenshot -o page.png` |
178
- | `url` | 获取当前 URL | `hab url` |
179
- | `title` | 获取页面标题 | `hab title` |
180
- | `content [selector]` | 获取文本内容 | `hab content` |
181
- | `evaluate <script>` | 执行 JavaScript | `hab evaluate "document.title"` |
173
+ | Command | Description | Example |
174
+ |---------|-------------|---------|
175
+ | `snapshot [-i\|--interactive]` | Get page snapshot | `hab snapshot -i` |
176
+ | `screenshot [-o <file>] [--full-page]` | Take screenshot | `hab screenshot -o page.png` |
177
+ | `url` | Get current URL | `hab url` |
178
+ | `title` | Get page title | `hab title` |
179
+ | `evaluate <script>` | Execute JavaScript | `hab evaluate "document.title"` |
182
180
 
183
- ### Session 命令
181
+ ### Session Commands
184
182
 
185
- | 命令 | 说明 | 示例 |
186
- |------|------|------|
187
- | `sessions` | 列出所有 Session | `hab sessions` |
188
- | `close [-s <name>]` | 关闭 Session | `hab close -s gmail` |
183
+ | Command | Description | Example |
184
+ |---------|-------------|---------|
185
+ | `sessions` | List all sessions | `hab sessions` |
186
+ | `close [-s <name>]` | Close session | `hab close -s gmail` |
189
187
 
190
- ### 全局选项
188
+ ### Global Options
191
189
 
192
- | 选项 | 说明 | 默认值 |
193
- |------|------|--------|
194
- | `-s, --session <name>` | 指定 Session 名称 | `default` |
195
- | `--headed` | 有头模式(显示浏览器) | `false` |
196
- | `--channel <chrome\|msedge>` | 浏览器类型 | `chrome` |
197
- | `--timeout <ms>` | 超时时间 | `30000` |
190
+ | Option | Description | Default |
191
+ |--------|-------------|---------|
192
+ | `-s, --session <name>` | Session name | `default` |
193
+ | `--headed` | Headed mode (show browser) | `false` |
194
+ | `--channel <chrome\|msedge>` | Browser type | `chrome` |
195
+ | `--timeout <ms>` | Timeout | `30000` |
198
196
 
199
- ## 🤖 AI Agent 集成(Claude Code
197
+ ## 🤖 AI Agent Integration (Claude Code)
200
198
 
201
- hyper-agent-browser 专为 AI Agent 设计,可与 Claude Code 无缝集成。
199
+ hyper-agent-browser is designed for AI agents and integrates seamlessly with Claude Code.
202
200
 
203
- ### 安装 Skill 文件
201
+ ### Install Skill File
204
202
 
205
203
  ```bash
206
- # 方法 1:从本地仓库复制
204
+ # Method 1: Copy from local repo
207
205
  mkdir -p ~/.claude/skills/hyper-agent-browser
208
- cp skills/hyper-browser.md ~/.claude/skills/hyper-agent-browser/skill.md
206
+ cp skills/hyper-agent-browser.md ~/.claude/skills/hyper-agent-browser/skill.md
209
207
 
210
- # 方法 2:直接下载
208
+ # Method 2: Direct download
211
209
  mkdir -p ~/.claude/skills/hyper-agent-browser
212
210
  curl -o ~/.claude/skills/hyper-agent-browser/skill.md \
213
- https://raw.githubusercontent.com/hubo1989/hyper-agent-browser/main/skills/hyper-browser.md
211
+ https://raw.githubusercontent.com/anthropics/hyper-agent-browser/main/skills/hyper-agent-browser.md
214
212
  ```
215
213
 
216
- ### 使用示例
214
+ ### Usage Examples
217
215
 
218
- 安装 Skill 后,Claude Code 会自动识别并使用 `hab` 命令。你可以这样指示 Claude:
216
+ After installing the skill, Claude Code will automatically recognize and use `hab` commands:
219
217
 
220
218
  ```
221
- "帮我打开 Google 搜索 'Bun runtime' 并截图"
222
- "登录我的 Gmail 账号,找到未读邮件数量"
223
- "访问 Twitter 并获取首页的所有推文标题"
219
+ "Help me open Google, search for 'Bun runtime' and take a screenshot"
220
+ "Log into my Gmail account and find the number of unread emails"
221
+ "Visit Twitter and get all tweet titles from the homepage"
224
222
  ```
225
223
 
226
- Claude 会自动:
227
- 1. 使用 `hab open` 打开网页
228
- 2. 使用 `hab snapshot -i` 获取元素引用
229
- 3. 分析快照,找到目标元素(如 `@e5`)
230
- 4. 使用 `hab click @e5` 等命令完成操作
231
-
232
- ### Skill 功能
233
-
234
- - ✅ 自动解析 `@eN` 引用
235
- - ✅ Session 管理(多账号隔离)
236
- - ✅ 错误处理和重试
237
- - ✅ 浏览器状态保持
238
- - ✅ 登录状态持久化
239
-
240
- ## 📋 选择器格式
241
-
242
- hyper-agent-browser 支持多种选择器格式:
243
-
244
- | 格式 | 示例 | 说明 | 推荐度 |
245
- |------|------|------|--------|
246
- | `@eN` | `@e1`, `@e5` | 元素引用(来自 snapshot) | ⭐⭐⭐⭐⭐ |
247
- | `css=` | `css=#login` | CSS 选择器 | ⭐⭐⭐ |
248
- | `text=` | `text=Sign in` | 文本匹配 | ⭐⭐⭐⭐ |
249
- | `xpath=` | `xpath=//button` | XPath 选择器 | ⭐⭐ |
250
-
251
- **推荐使用 `@eN` 引用**:
252
- - 无需手写选择器
253
- - 自动处理动态 ID/Class
254
- - AI Agent 友好
255
-
256
- ## 🎯 核心功能详解
257
-
258
- ### 1. 元素引用系统
259
-
260
- 不需要手写复杂的选择器:
261
-
262
- ```bash
263
- # 传统方式(繁琐、易出错)
264
- hab click 'css=button.MuiButton-root.MuiButton-contained.MuiButton-sizeMedium'
265
-
266
- # hyper-agent-browser 方式(简单、可靠)
267
- hab snapshot -i # 自动生成 @e1, @e2... 引用
268
- hab click @e5 # 直接使用引用
269
- ```
270
-
271
- ### 2. Session 持久化
272
-
273
- 每个 Session 有独立的:
274
- - 浏览器实例
275
- - UserData 目录(Cookies/LocalStorage)
276
- - 登录状态
277
- - 浏览历史
278
-
279
- ```
280
- ~/.hab/sessions/
281
- ├── default/
282
- │ ├── userdata/ # Chrome UserData
283
- │ ├── session.json # 元数据(wsEndpoint/pid/url)
284
- │ └── element-refs.json # @eN 映射
285
- ├── gmail-personal/
286
- └── gmail-work/
287
- ```
288
-
289
- ### 3. 浏览器复用
290
-
291
- CLI 每次调用是独立进程,但浏览器实例会持久化复用:
292
-
293
- ```bash
294
- # 第一次:启动新浏览器 (~1-2s)
295
- hab --headed open https://google.com
296
-
297
- # 后续调用:复用浏览器 (~50ms)
298
- hab snapshot -i
299
- hab click @e1
300
- ```
301
-
302
- ## 🔒 安全特性
303
-
304
- hyper-agent-browser v0.1.0 包含全面的安全加固:
305
-
306
- ### 1. evaluate 命令沙箱
307
-
308
- - ✅ 白名单模式(仅允许安全的 document/window 操作)
309
- - ✅ 增强黑名单(阻止 eval/Function/constructor/globalThis)
310
- - ✅ 结果大小限制(最大 100KB,防止数据窃取)
311
-
312
- ### 2. Session 文件权限保护
313
-
314
- - ✅ session.json 权限设置为 `0o600`(仅所有者可读写)
315
- - ✅ 保护 wsEndpoint 不被其他进程劫持
316
-
317
- ### 3. 配置文件权限保护
318
-
319
- - ✅ config.json 权限设置为 `0o600`
320
- - ✅ 保护敏感配置
321
-
322
- ### 4. Chrome 扩展安全验证
224
+ Claude will automatically:
225
+ 1. Use `hab open` to open the webpage
226
+ 2. Use `hab snapshot -i` to get element references
227
+ 3. Analyze the snapshot to find target elements (e.g., `@e5`)
228
+ 4. Use `hab click @e5` and other commands to complete the task
323
229
 
324
- - 扩展白名单机制
325
- - ✅ 自动检查扩展 manifest 危险权限
326
- - ✅ 过滤含 debugger/webRequest/proxy 权限的扩展
230
+ ## 📋 Selector Format
327
231
 
328
- ### 5. 系统 Keychain 隔离
232
+ | Format | Example | Description | Recommended |
233
+ |--------|---------|-------------|-------------|
234
+ | `@eN` | `@e1`, `@e5` | Element reference (from snapshot) | ⭐⭐⭐⭐⭐ |
235
+ | `css=` | `css=#login` | CSS selector | ⭐⭐⭐ |
236
+ | `text=` | `text=Sign in` | Text match | ⭐⭐⭐⭐ |
237
+ | `xpath=` | `xpath=//button` | XPath selector | ⭐⭐ |
329
238
 
330
- - 默认使用隔离的密码存储
331
- - 通过 `HAB_USE_SYSTEM_KEYCHAIN=true` 显式启用
239
+ **Recommended: Use `@eN` references**:
240
+ - No manual selector writing
241
+ - Auto-handles dynamic IDs/Classes
242
+ - AI Agent friendly
332
243
 
333
- ### 6. 配置键白名单验证
244
+ ## 🔒 Security Features
334
245
 
335
- - ✅ 仅允许修改安全的配置键
336
- - ✅ 阻止危险浏览器参数注入
246
+ - ✅ **evaluate Sandbox** - Whitelist mode, blocks dangerous operations
247
+ - ✅ **Session File Protection** - Permissions set to `0o600`
248
+ - ✅ **Chrome Extension Verification** - Whitelist + dangerous permission filtering
249
+ - ✅ **System Keychain Isolation** - Isolated password storage by default
250
+ - ✅ **Config Key Whitelist** - Prevents dangerous browser argument injection
337
251
 
338
- ## 🏗️ 架构
252
+ ## 🏗️ Architecture
339
253
 
340
254
  ```
341
255
  src/
342
- ├── cli.ts # CLI 入口(Commander.js
256
+ ├── cli.ts # CLI entry (Commander.js)
343
257
  ├── browser/
344
- ├── manager.ts # 浏览器生命周期管理
345
- │ └── context.ts # BrowserContext 封装
258
+ └── manager.ts # Browser lifecycle management
259
+ ├── daemon/
260
+ │ ├── server.ts # Daemon server
261
+ │ ├── client.ts # Daemon client
262
+ │ └── browser-pool.ts # Browser instance pool
346
263
  ├── session/
347
- │ ├── manager.ts # Session 管理(多浏览器实例)
348
- │ └── store.ts # UserData 持久化
264
+ │ ├── manager.ts # Session management
265
+ │ └── store.ts # UserData persistence
349
266
  ├── commands/
350
267
  │ ├── navigation.ts # open/reload/back/forward
351
268
  │ ├── actions.ts # click/fill/type/press/scroll
352
269
  │ ├── info.ts # snapshot/screenshot/evaluate
353
- └── session.ts # sessions/close
270
+ ├── extract.ts # Data extraction commands
271
+ │ └── network.ts # Network monitoring
354
272
  ├── snapshot/
355
- │ ├── accessibility.ts # Accessibility Tree 提取元素
356
- │ ├── dom-extractor.ts # DOM 提取器(fallback
357
- ├── formatter.ts # 格式化输出
358
- │ └── reference-store.ts # @eN 映射存储
273
+ │ ├── accessibility.ts # Extract from Accessibility Tree
274
+ │ ├── dom-extractor.ts # DOM extractor (fallback)
275
+ └── reference-store.ts # @eN mapping storage
359
276
  └── utils/
360
- ├── selector.ts # 选择器解析
361
- ├── config.ts # 配置管理
362
- ├── errors.ts # 错误处理
363
- └── logger.ts # 日志
277
+ ├── selector.ts # Selector parsing
278
+ ├── config.ts # Config management
279
+ └── errors.ts # Error handling
364
280
  ```
365
281
 
366
- ## 📊 技术栈
282
+ ## 📊 Tech Stack
367
283
 
368
- - **Bun** 1.2.21 - JavaScript 运行时
369
- - **Patchright** 1.57.0 - 反检测 Playwright fork
370
- - **Commander.js** 12.1.0 - CLI 框架
371
- - **Zod** 3.25.76 - 数据验证
372
- - **Biome** 1.9.4 - 代码规范
284
+ - **Bun** 1.2.21 - JavaScript runtime
285
+ - **Patchright** 1.57.0 - Anti-detection Playwright fork
286
+ - **Commander.js** 12.1.0 - CLI framework
287
+ - **Zod** 3.25.76 - Data validation
288
+ - **Biome** 1.9.4 - Code linting
373
289
 
374
- ## 🛠️ 开发
290
+ ## 🛠️ Development
375
291
 
376
292
  ```bash
377
- # 克隆仓库
378
- git clone https://github.com/hubo1989/hyper-agent-browser.git
293
+ # Clone repo
294
+ git clone https://github.com/anthropics/hyper-agent-browser.git
379
295
  cd hyper-agent-browser
380
296
 
381
- # 安装依赖
297
+ # Install dependencies
382
298
  bun install
383
299
 
384
- # 开发模式运行
300
+ # Development mode
385
301
  bun dev -- --headed open https://google.com
386
302
 
387
- # 运行测试
303
+ # Run tests
388
304
  bun test
389
305
 
390
- # 类型检查
306
+ # Type check
391
307
  bun run typecheck
392
308
 
393
- # 代码规范检查
309
+ # Lint
394
310
  bun run lint
395
311
 
396
- # 构建
397
- bun run build # 当前平台
398
- bun run build:all # 所有平台
312
+ # Build
313
+ bun run build # Current platform
314
+ bun run build:all # All platforms
399
315
  ```
400
316
 
401
- ## 📚 文档
317
+ ## 📚 Documentation
402
318
 
403
- - [GETTING_STARTED.md](./GETTING_STARTED.md) - 快速入门指南
404
- - [ELEMENT_REFERENCE_GUIDE.md](./ELEMENT_REFERENCE_GUIDE.md) - @eN 引用完整文档
405
- - [GOOGLE_PROFILE_GUIDE.md](./GOOGLE_PROFILE_GUIDE.md) - Google Profile 集成
406
- - [CLAUDE.md](./CLAUDE.md) - 开发者文档
407
- - [hyper-agent-browser-spec.md](./hyper-agent-browser-spec.md) - 技术规格
408
- - [Skill 文档](./skills/hyper-browser.md) - Claude Code Skill 说明
319
+ - [Quick Start Guide](./GETTING_STARTED.md)
320
+ - [Element Reference Guide](./ELEMENT_REFERENCE_GUIDE.md)
321
+ - [Google Profile Integration](./GOOGLE_PROFILE_GUIDE.md)
322
+ - [Developer Docs](./CLAUDE.md)
323
+ - [Technical Spec](./hyper-agent-browser-spec.md)
324
+ - [Skill Documentation](./skills/hyper-agent-browser.md)
325
+ - [中文文档 (Chinese)](./docs/README_CN.md)
409
326
 
410
- ## 🤝 贡献
327
+ ## 🤝 Contributing
411
328
 
412
- 欢迎 Pull Requests!请确保:
329
+ Pull Requests welcome! Please ensure:
413
330
 
414
- - ✅ TypeScript 类型检查通过:`bun run typecheck`
415
- - ✅ 测试通过:`bun test`
416
- - ✅ 代码规范检查通过:`bun run lint`
331
+ - ✅ TypeScript type check passes: `bun run typecheck`
332
+ - ✅ Tests pass: `bun test`
333
+ - ✅ Lint passes: `bun run lint`
417
334
 
418
- ## 📄 许可证
335
+ ## 📄 License
419
336
 
420
337
  [MIT](./LICENSE)
421
338
 
422
- ## 🔗 相关链接
339
+ ## 🔗 Links
423
340
 
424
- - **npm 包**: https://www.npmjs.com/package/hyper-agent-browser
425
- - **GitHub**: https://github.com/hubo1989/hyper-agent-browser
426
- - **Issues**: https://github.com/hubo1989/hyper-agent-browser/issues
427
- - **Releases**: https://github.com/hubo1989/hyper-agent-browser/releases
341
+ - **npm**: https://www.npmjs.com/package/hyper-agent-browser
342
+ - **GitHub**: https://github.com/anthropics/hyper-agent-browser
343
+ - **Issues**: https://github.com/anthropics/hyper-agent-browser/issues
344
+ - **Releases**: https://github.com/anthropics/hyper-agent-browser/releases
428
345
 
429
- ## 🙏 致谢
346
+ ## 🙏 Acknowledgments
430
347
 
431
- - [Patchright](https://github.com/Patchright/patchright) - 反检测 Playwright fork
432
- - [agent-browser](https://github.com/anthropics/agent-browser) - CLI 设计灵感
433
- - [Bun](https://bun.sh) - 快速的 JavaScript 运行时
434
- - [Claude Code](https://claude.ai/code) - AI 编程助手
348
+ - [Patchright](https://github.com/Patchright/patchright) - Anti-detection Playwright fork
349
+ - [Bun](https://bun.sh) - Fast JavaScript runtime
350
+ - [Claude Code](https://claude.ai/code) - AI programming assistant
435
351
 
436
352
  ---
437
353
 
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "hyper-agent-browser",
3
- "version": "0.3.1",
3
+ "version": "0.4.0",
4
4
  "description": "Pure browser automation CLI for AI Agents - 纯浏览器自动化 CLI,专为 AI Agent 设计",
5
5
  "type": "module",
6
6
  "main": "src/cli.ts",
@@ -149,14 +149,23 @@ export class BrowserManager {
149
149
  }
150
150
 
151
151
  // Use launchPersistentContext for UserData persistence
152
- this.context = await chromium.launchPersistentContext(this.session.userDataDir, {
152
+ // 给启动加上超时保护(15秒)
153
+ const launchPromise = chromium.launchPersistentContext(this.session.userDataDir, {
153
154
  channel: this.options.channel,
154
155
  headless: !this.options.headed,
155
156
  args: launchArgs,
156
157
  ignoreDefaultArgs: ignoreArgs,
157
158
  viewport: { width: 1280, height: 720 },
159
+ timeout: 15000, // 15秒启动超时
158
160
  });
159
161
 
162
+ this.context = await Promise.race([
163
+ launchPromise,
164
+ new Promise<never>((_, reject) =>
165
+ setTimeout(() => reject(new Error("Browser launch timeout (15s)")), 15000),
166
+ ),
167
+ ]);
168
+
160
169
  // Extract browser from context
161
170
  // @ts-ignore - context has _browser property
162
171
  this.browser = this.context._browser;
@@ -334,6 +343,22 @@ export class BrowserManager {
334
343
  // 重新连接
335
344
  await this.connect();
336
345
  }
346
+
347
+ // 确保返回当前活动页面(可能有多个页面时需要获取最新的)
348
+ if (this.context) {
349
+ const pages = this.context.pages();
350
+ if (pages.length > 0) {
351
+ // 优先返回非 about:blank 的页面
352
+ const activePage = pages.find((p) => p.url() !== "about:blank") || pages[pages.length - 1];
353
+ if (activePage !== this.page) {
354
+ this.page = activePage;
355
+ if (this.options.timeout) {
356
+ this.page.setDefaultTimeout(this.options.timeout);
357
+ }
358
+ }
359
+ }
360
+ }
361
+
337
362
  return this.page!;
338
363
  }
339
364
 
@@ -353,7 +378,30 @@ export class BrowserManager {
353
378
 
354
379
  async close(): Promise<void> {
355
380
  if (this.browser) {
356
- await this.browser.close();
381
+ // 获取 PID 以便超时后强制 kill
382
+ const pid = this.getPid();
383
+
384
+ try {
385
+ // 给 close 操作 5 秒超时
386
+ await Promise.race([
387
+ this.browser.close(),
388
+ new Promise((_, reject) =>
389
+ setTimeout(() => reject(new Error("Browser close timeout")), 5000),
390
+ ),
391
+ ]);
392
+ } catch (error) {
393
+ console.log("Browser close failed or timed out, forcing cleanup...");
394
+ // 强制 kill 进程
395
+ if (pid) {
396
+ try {
397
+ process.kill(pid, "SIGKILL");
398
+ console.log(`Force killed browser process (PID: ${pid})`);
399
+ } catch {
400
+ // 进程可能已经退出
401
+ }
402
+ }
403
+ }
404
+
357
405
  this.browser = null;
358
406
  this.context = null;
359
407
  this.page = null;
package/src/cli.ts CHANGED
@@ -622,6 +622,43 @@ program
622
622
  console.log(result);
623
623
  });
624
624
 
625
+ // Download commands
626
+ program
627
+ .command("download <selector>")
628
+ .description("Download file by clicking element (preserves auth/cookies)")
629
+ .option("-o, --output <path>", "Output path or directory")
630
+ .option("--timeout <ms>", "Download timeout in milliseconds", "60000")
631
+ .action(async (selector, options, command) => {
632
+ const result = await executeViaDaemon(
633
+ "download",
634
+ {
635
+ selector,
636
+ output: options.output,
637
+ timeout: Number.parseInt(options.timeout),
638
+ },
639
+ command.parent,
640
+ );
641
+ console.log(result);
642
+ });
643
+
644
+ program
645
+ .command("download-url <url>")
646
+ .description("Download file directly from URL (preserves auth/cookies)")
647
+ .option("-o, --output <path>", "Output path or directory")
648
+ .option("--timeout <ms>", "Download timeout in milliseconds", "60000")
649
+ .action(async (url, options, command) => {
650
+ const result = await executeViaDaemon(
651
+ "download-url",
652
+ {
653
+ url,
654
+ output: options.output,
655
+ timeout: Number.parseInt(options.timeout),
656
+ },
657
+ command.parent,
658
+ );
659
+ console.log(result);
660
+ });
661
+
625
662
  // Network commands
626
663
  program
627
664
  .command("network-start")
@@ -43,7 +43,42 @@ async function getLocator(page: Page, selector: string): Promise<Locator> {
43
43
 
44
44
  export async function click(page: Page, selector: string): Promise<void> {
45
45
  const locator = await getLocator(page, selector);
46
- await locator.click();
46
+
47
+ // 先尝试正常点击
48
+ try {
49
+ await locator.click({ timeout: 5000 });
50
+ } catch (error) {
51
+ // 如果被遮罩拦截,逐级降级
52
+ if (error instanceof Error && error.message.includes("intercepts pointer events")) {
53
+ console.log("Element intercepted, trying force click...");
54
+ try {
55
+ await locator.click({ force: true, timeout: 5000 });
56
+ } catch {
57
+ // force click 失败,使用完整鼠标事件序列(兼容 React 等框架)
58
+ console.log("Force click failed, using mouse event sequence...");
59
+ await locator.evaluate((el: HTMLElement) => {
60
+ const rect = el.getBoundingClientRect();
61
+ const x = rect.left + rect.width / 2;
62
+ const y = rect.top + rect.height / 2;
63
+
64
+ // 模拟完整鼠标事件序列
65
+ for (const type of ["mousedown", "mouseup", "click"]) {
66
+ el.dispatchEvent(
67
+ new MouseEvent(type, {
68
+ bubbles: true,
69
+ cancelable: true,
70
+ view: window,
71
+ clientX: x,
72
+ clientY: y,
73
+ }),
74
+ );
75
+ }
76
+ });
77
+ }
78
+ } else {
79
+ throw error;
80
+ }
81
+ }
47
82
  }
48
83
 
49
84
  export async function fill(page: Page, selector: string, value: string): Promise<void> {
@@ -0,0 +1,202 @@
1
+ import { existsSync, mkdirSync } from "node:fs";
2
+ import { basename, join } from "node:path";
3
+ import type { Page } from "patchright";
4
+ import { parseSelector } from "../utils/selector";
5
+
6
+ // Global reference store for element mappings
7
+ let globalReferenceStore: any = null;
8
+
9
+ export function setReferenceStore(store: any) {
10
+ globalReferenceStore = store;
11
+ }
12
+
13
+ async function getLocator(page: Page, selector: string) {
14
+ const parsed = parseSelector(selector);
15
+
16
+ switch (parsed.type) {
17
+ case "ref": {
18
+ if (!globalReferenceStore) {
19
+ throw new Error(
20
+ `Element reference @${parsed.value} requires a snapshot first. Run 'hab snapshot -i' to generate element references.`,
21
+ );
22
+ }
23
+
24
+ const actualSelector = globalReferenceStore.get(parsed.value);
25
+ if (!actualSelector) {
26
+ throw new Error(
27
+ `Element reference @${parsed.value} not found. Run 'hab snapshot -i' to update element references.`,
28
+ );
29
+ }
30
+
31
+ return page.locator(actualSelector);
32
+ }
33
+ case "css":
34
+ return page.locator(parsed.value);
35
+ case "text":
36
+ return page.getByText(parsed.value);
37
+ case "xpath":
38
+ return page.locator(`xpath=${parsed.value}`);
39
+ default:
40
+ throw new Error(`Unknown selector type: ${parsed.type}`);
41
+ }
42
+ }
43
+
44
+ export interface DownloadOptions {
45
+ output?: string;
46
+ timeout?: number;
47
+ }
48
+
49
+ export interface DownloadResult {
50
+ success: boolean;
51
+ path: string;
52
+ filename: string;
53
+ size?: number;
54
+ }
55
+
56
+ /**
57
+ * Download file by clicking an element (link/button)
58
+ * Uses browser's native download capability to preserve cookies/auth
59
+ */
60
+ export async function download(
61
+ page: Page,
62
+ selector: string,
63
+ options: DownloadOptions = {},
64
+ ): Promise<DownloadResult> {
65
+ const timeout = options.timeout || 60000;
66
+
67
+ // Determine output directory
68
+ let outputDir: string;
69
+ let specifiedFilename: string | undefined;
70
+
71
+ if (options.output) {
72
+ if (options.output.endsWith("/") || !options.output.includes(".")) {
73
+ // It's a directory
74
+ outputDir = options.output;
75
+ } else {
76
+ // It's a file path
77
+ outputDir = join(options.output, "..");
78
+ specifiedFilename = basename(options.output);
79
+ }
80
+ } else {
81
+ // Default to current directory
82
+ outputDir = process.cwd();
83
+ }
84
+
85
+ // Ensure output directory exists
86
+ if (!existsSync(outputDir)) {
87
+ mkdirSync(outputDir, { recursive: true });
88
+ }
89
+
90
+ const locator = await getLocator(page, selector);
91
+
92
+ // Wait for download event while clicking
93
+ const downloadPromise = page.waitForEvent("download", { timeout });
94
+
95
+ await locator.click();
96
+
97
+ const download = await downloadPromise;
98
+
99
+ // Get suggested filename or use specified one
100
+ const filename = specifiedFilename || download.suggestedFilename();
101
+ const outputPath = join(outputDir, filename);
102
+
103
+ // Save the file
104
+ await download.saveAs(outputPath);
105
+
106
+ // Get file stats if possible
107
+ let size: number | undefined;
108
+ try {
109
+ const stat = await Bun.file(outputPath).stat();
110
+ size = stat?.size;
111
+ } catch {
112
+ // Ignore stat errors
113
+ }
114
+
115
+ return {
116
+ success: true,
117
+ path: outputPath,
118
+ filename,
119
+ size,
120
+ };
121
+ }
122
+
123
+ /**
124
+ * Download file directly from URL
125
+ * Uses Playwright's request API to preserve cookies/auth (bypasses CORS)
126
+ */
127
+ export async function downloadUrl(
128
+ page: Page,
129
+ url: string,
130
+ options: DownloadOptions = {},
131
+ ): Promise<DownloadResult> {
132
+ // Determine output directory and filename
133
+ let outputDir: string;
134
+ let specifiedFilename: string | undefined;
135
+
136
+ if (options.output) {
137
+ if (options.output.endsWith("/") || !options.output.includes(".")) {
138
+ outputDir = options.output;
139
+ } else {
140
+ outputDir = join(options.output, "..");
141
+ specifiedFilename = basename(options.output);
142
+ }
143
+ } else {
144
+ outputDir = process.cwd();
145
+ }
146
+
147
+ // Ensure output directory exists
148
+ if (!existsSync(outputDir)) {
149
+ mkdirSync(outputDir, { recursive: true });
150
+ }
151
+
152
+ // Use Playwright's request API (shares cookies with browser context, bypasses CORS)
153
+ const context = page.context();
154
+ const response = await context.request.get(url);
155
+
156
+ if (!response.ok()) {
157
+ throw new Error(`HTTP ${response.status()}: ${response.statusText()}`);
158
+ }
159
+
160
+ // Get filename from Content-Disposition header or URL
161
+ let filename = "";
162
+ const contentDisposition = response.headers()["content-disposition"];
163
+ if (contentDisposition) {
164
+ const match = contentDisposition.match(/filename[^;=\n]*=((['"]).*?\2|[^;\n]*)/);
165
+ if (match) {
166
+ filename = match[1].replace(/['"]/g, "");
167
+ }
168
+ }
169
+
170
+ // Determine final filename
171
+ filename = specifiedFilename || filename || basename(new URL(url).pathname) || "download";
172
+ const outputPath = join(outputDir, filename);
173
+
174
+ // Get body and write to file
175
+ const body = await response.body();
176
+ await Bun.write(outputPath, body);
177
+
178
+ return {
179
+ success: true,
180
+ path: outputPath,
181
+ filename,
182
+ size: body.byteLength,
183
+ };
184
+ }
185
+
186
+ /**
187
+ * Format file size for display
188
+ */
189
+ function formatSize(bytes: number): string {
190
+ if (bytes < 1024) return `${bytes} B`;
191
+ if (bytes < 1024 * 1024) return `${(bytes / 1024).toFixed(1)} KB`;
192
+ if (bytes < 1024 * 1024 * 1024) return `${(bytes / (1024 * 1024)).toFixed(1)} MB`;
193
+ return `${(bytes / (1024 * 1024 * 1024)).toFixed(1)} GB`;
194
+ }
195
+
196
+ /**
197
+ * Format download result for CLI output
198
+ */
199
+ export function formatDownloadResult(result: DownloadResult): string {
200
+ const sizeStr = result.size ? ` (${formatSize(result.size)})` : "";
201
+ return `Downloaded: ${result.filename}${sizeStr}\nSaved to: ${result.path}`;
202
+ }
@@ -1,5 +1,6 @@
1
1
  import * as actionCommands from "../commands/actions";
2
2
  import * as advancedCommands from "../commands/advanced";
3
+ import * as downloadCommands from "../commands/download";
3
4
  import * as extractCommands from "../commands/extract";
4
5
  import * as getterCommands from "../commands/getters";
5
6
  import * as infoCommands from "../commands/info";
@@ -595,6 +596,26 @@ export class DaemonServer {
595
596
  result = await networkCommands.networkStop(args.listenerId);
596
597
  break;
597
598
 
599
+ // Download commands
600
+ case "download": {
601
+ downloadCommands.setReferenceStore(referenceStore);
602
+ const downloadResult = await downloadCommands.download(page, args.selector, {
603
+ output: args.output,
604
+ timeout: args.timeout,
605
+ });
606
+ result = downloadCommands.formatDownloadResult(downloadResult);
607
+ break;
608
+ }
609
+
610
+ case "download-url": {
611
+ const downloadUrlResult = await downloadCommands.downloadUrl(page, args.url, {
612
+ output: args.output,
613
+ timeout: args.timeout,
614
+ });
615
+ result = downloadCommands.formatDownloadResult(downloadUrlResult);
616
+ break;
617
+ }
618
+
598
619
  default:
599
620
  throw new Error(`Unknown command: ${command}`);
600
621
  }