@sleepinsummer/agent-browser-cli 0.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/AI_INSTALL.md ADDED
@@ -0,0 +1,126 @@
1
+ # AI 安装说明
2
+
3
+ 把下面这段话发给 AI,让 AI 在你的本机环境里完成安装、配置 skill 和验证。
4
+
5
+ ```text
6
+ 请帮我安装 agent-browser-cli:https://github.com/sleepinginsummer/agent-browser-cli
7
+
8
+ 要求:
9
+ 1. 优先使用 npm 安装:npm install -g @sleepinsummer/agent-browser-cli。
10
+ 2. 指导我在 Chrome 中加载 assets/tmwd_cdp_bridge 解压扩展。
11
+ 3. 如果之前已经加载过扩展,必须在 chrome://extensions 里重新加载 `assets/tmwd_cdp_bridge`,确保最新 `config.js` 和 `background.js` 生效。
12
+ 4. 将 skills/agent-browser-cli/SKILL.md 安装到当前 AI 可识别的 skills 目录。
13
+ 5. 执行 agent-browser-cli tabs、open 和 status 验证可用。
14
+ 6. 如果 npm 平台包暂不支持当前系统,再回退到源码构建:cargo build --release。
15
+ ```
16
+
17
+ ## 1. 安装 CLI
18
+
19
+ 优先使用 npm 全局安装:
20
+
21
+ ```bash
22
+ npm install -g @sleepinsummer/agent-browser-cli
23
+ agent-browser-cli --help
24
+ ```
25
+
26
+ 当前 npm 包按平台安装原生二进制:
27
+
28
+ ```text
29
+ @sleepinsummer/agent-browser-cli
30
+ @sleepinsummer/agent-browser-cli-darwin-arm64
31
+ @sleepinsummer/agent-browser-cli-darwin-x64
32
+ @sleepinsummer/agent-browser-cli-win32-x64
33
+ ```
34
+
35
+ 如果当前平台包暂未发布或安装失败,使用源码构建:
36
+
37
+ ```bash
38
+ git clone https://github.com/sleepinginsummer/agent-browser-cli.git
39
+ cd agent-browser-cli
40
+ cargo build --release
41
+ ./target/release/agent-browser-cli --help
42
+ ```
43
+
44
+ ## 2. 加载 Chrome 扩展
45
+
46
+ 如果使用 npm 安装,需要先下载或克隆仓库,用于加载扩展和安装 skill:
47
+
48
+ ```bash
49
+ git clone https://github.com/sleepinginsummer/agent-browser-cli.git
50
+ cd agent-browser-cli
51
+ ```
52
+
53
+ 在 Chrome 打开:
54
+
55
+ ```text
56
+ chrome://extensions
57
+ ```
58
+
59
+ 开启“开发者模式”,加载已解压扩展目录:
60
+
61
+ ```text
62
+ assets/tmwd_cdp_bridge
63
+ ```
64
+
65
+ 如果之前已经安装过旧版 GenericAgent 的 `tmwd_cdp_bridge` 扩展,可以继续使用同协议旧扩展;但建议加载当前仓库的 `assets/tmwd_cdp_bridge` 并点击“重新加载”。
66
+
67
+ 当前扩展配置应包含:
68
+
69
+ ```js
70
+ const TID = '__agent_browser_cli_bridge_26c9f1';
71
+ ```
72
+
73
+ Chrome 至少需要打开一个正常网页标签页,不要只停留在 `about:blank` 或 `chrome://` 页面。
74
+
75
+ ## 3. 安装 skill
76
+
77
+ 将仓库中的 `skills/agent-browser-cli/SKILL.md` 安装到 AI 使用的 skills 目录。
78
+
79
+ 通用目录示例:
80
+
81
+ ```bash
82
+ mkdir -p ~/.agents/skills/agent-browser-cli
83
+ cp skills/agent-browser-cli/SKILL.md ~/.agents/skills/agent-browser-cli/SKILL.md
84
+ ```
85
+
86
+ Codex 默认目录示例:
87
+
88
+ ```bash
89
+ mkdir -p ~/.codex/skills/agent-browser-cli
90
+ cp skills/agent-browser-cli/SKILL.md ~/.codex/skills/agent-browser-cli/SKILL.md
91
+ ```
92
+
93
+ 如果 AI 使用其它 skills 目录,将 `SKILL.md` 复制到对应的 `agent-browser-cli/SKILL.md`。
94
+
95
+ ## 4. 验证
96
+
97
+ ```bash
98
+ agent-browser-cli tabs
99
+ agent-browser-cli open https://www.baidu.com
100
+ agent-browser-cli status
101
+ ```
102
+
103
+ 成功时,`tabs` 会返回 `ok: true`,并包含当前 Chrome 标签页数量。
104
+ `open` 应能原生新开标签页,不应使用 `exec --monitor` 或 `window.open` 代替。
105
+
106
+ 如果常驻服务需要重载最新代码:
107
+
108
+ ```bash
109
+ agent-browser-cli restart
110
+ ```
111
+
112
+ ## 5. 使用入口
113
+
114
+ 拿到标签页 ID 后,可以执行:
115
+
116
+ ```bash
117
+ agent-browser-cli scan --tab <tabId> --text-only
118
+ agent-browser-cli exec --tab <tabId> 'return document.title'
119
+ ```
120
+
121
+ 完整命令和浏览器操作 SOP 见:
122
+
123
+ ```text
124
+ skills/agent-browser-cli/SKILL.md
125
+ ```
126
+
package/LICENSE ADDED
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2026 sleepinginsummer
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
package/README.md ADDED
@@ -0,0 +1,194 @@
1
+ <div align="center">
2
+
3
+ # agent-browser-cli
4
+
5
+ 面向 Agent 的浏览器感知与控制 CLI,把真实 Chrome 会话变成可复用的标签页扫描、页面 JS、Cookie、CDP 和截图能力。
6
+
7
+ 浏览器感知 · 页面控制 · Chrome 登录态复用 · CDP · 条件等待 · Agent Skill 集成
8
+
9
+ <p>
10
+ <a href="https://github.com/sleepinginsummer/agent-browser-cli"><img src="https://img.shields.io/badge/CLI-agentbrowsercli-2ea44f" alt="CLI agentbrowsercli"></a>
11
+ <a href="https://github.com/sleepinginsummer/agent-browser-cli/blob/main/LICENSE"><img src="https://img.shields.io/badge/License-MIT-green" alt="License MIT"></a>
12
+ <a href="https://github.com/sleepinginsummer/agent-browser-cli"><img src="https://img.shields.io/badge/Windows-MacOS-0078D6?labelColor=0078D6&color=C0C0C0" alt="Windows/MacOS"></a>
13
+ <a href="https://github.com/sleepinginsummer/agent-browser-cli/releases"><img src="https://img.shields.io/badge/release-v0.2.0-blue" alt="release v0.2.0"></a>
14
+ <a href="https://github.com/sleepinginsummer/agent-browser-cli/pulls"><img src="https://img.shields.io/badge/PRs-welcome-brightgreen" alt="PRs welcome"></a>
15
+ </p>
16
+
17
+ [AI 一句话安装](#ai-一句话安装) · [手动安装](#手动安装) · [Chrome 扩展](#chrome-扩展) · [更新](#更新) · [卸载](#卸载) · [友情链接](#友情链接)
18
+
19
+ 中文 | [English](README_EN.md)
20
+
21
+ </div>
22
+
23
+ `agent-browser-cli` 是一个面向 Agent 的浏览器感知与控制工具。它通过 Chrome 扩展连接用户真实浏览器,保留登录态和 Cookie,提供标签页扫描、页面 JS 执行、Cookie 读取、CDP 控制、截图、文件上传、下拉框点击等能力。
24
+
25
+ 本项目不是 Selenium / Playwright。它更适合在已有浏览器会话中辅助 Agent 精确读取页面和执行操作。
26
+
27
+ ## 项目信息
28
+
29
+ - 当前版本:`0.2.0`
30
+ - 支持平台:Windows、macOS
31
+ - 运行时:Rust 原生二进制;npm 安装时按平台选择二进制包
32
+ - 浏览器:Chrome / Chromium,需加载 `assets/tmwd_cdp_bridge`
33
+
34
+ ## 致谢
35
+
36
+ 本项目的浏览器控制能力提取并改造自 [GenericAgent](https://github.com/lsdefine/GenericAgent) 项目中的 Web 工具链,包括 `TMWebDriver`、`simphtml` 和 `tmwd_cdp_bridge` 扩展相关思路与实现。
37
+
38
+ 感谢 GenericAgent 项目提供的浏览器桥接、页面简化、CDP 控制和实践 SOP。本仓库在此基础上做了面向独立使用和 CLI 调用的整理与增强。
39
+
40
+ ## AI 一句话安装
41
+
42
+ ```text
43
+ 请阅读 https://github.com/sleepinginsummer/agent-browser-cli/blob/main/AI_INSTALL.md,按说明安装 CLI、加载 Chrome 扩展,并添加 `skills/agent-browser-cli/SKILL.md`。
44
+ ```
45
+
46
+ ## 改进内容
47
+
48
+ - 从 GenericAgent 中拆出浏览器控制能力,使用cli 提供给codex、claude code、opencode使用。GenericAgent浏览器插件不需要重新安装,可以共用同一个插件
49
+ - 避免每次命令都重新初始化浏览器连接。
50
+ - 新增启动锁,避免多个 CLI 并发启动时重复绑定底层端口。
51
+ - 增加skill:`skills/agent-browser-cli/SKILL.md`,提供ai参考使用。
52
+ - 若干优化,缩短命令执行时间
53
+
54
+ ## 目录结构
55
+
56
+ ```text
57
+ .
58
+ ├── agent_browser_cli.py # 命令行入口
59
+ ├── agent_browser_server.py # 常驻 HTTP 服务
60
+ ├── ga.py # web_scan / web_execute_js 入口
61
+ ├── TMWebDriver.py # 浏览器扩展 WebSocket / HTTP 桥
62
+ ├── simphtml.py # 页面简化和 DOM diff
63
+ ├── assets/tmwd_cdp_bridge/ # Chrome MV3 扩展
64
+ ├── memory/ # 浏览器工具 SOP
65
+ └── skills/agent-browser-cli/ # skill
66
+ ```
67
+
68
+ ## 手动安装
69
+
70
+ ### npm 安装
71
+
72
+ ```bash
73
+ npm install -g @sleepinsummer/agent-browser-cli
74
+ agent-browser-cli tabs
75
+ ```
76
+
77
+ 当前 npm 分发采用主包 + 平台二进制包:
78
+
79
+ ```text
80
+ @sleepinsummer/agent-browser-cli
81
+ @sleepinsummer/agent-browser-cli-darwin-arm64
82
+ @sleepinsummer/agent-browser-cli-darwin-x64
83
+ @sleepinsummer/agent-browser-cli-win32-x64
84
+ ```
85
+
86
+ ### 本地源码构建
87
+
88
+ ```bash
89
+ cargo build --release
90
+ ./target/release/agent-browser-cli tabs
91
+ ```
92
+
93
+ ### Python 旧版运行方式
94
+
95
+ Python 实现暂时保留为迁移参考和回退入口:
96
+
97
+ ```bash
98
+ cd /path/to/agent-browser-cli
99
+ python3 -m venv .venv
100
+ .venv/bin/python -m pip install -r requirements.txt
101
+ ```
102
+
103
+ ## Chrome 扩展
104
+
105
+ 加载扩展目录:
106
+
107
+ ```text
108
+ assets/tmwd_cdp_bridge
109
+ ```
110
+
111
+ Chrome 需要至少打开一个正常网页标签页,不要只停留在 `about:blank` 或 `chrome://` 页面。
112
+
113
+ ## 快速自检
114
+
115
+ ```bash
116
+ agent-browser-cli tabs
117
+ agent-browser-cli open https://www.baidu.com
118
+ ```
119
+
120
+ 成功时会返回:
121
+
122
+ ```json
123
+ {
124
+ "ok": true,
125
+ "result": {
126
+ "status": "success",
127
+ "metadata": {
128
+ "tabs_count": 1
129
+ }
130
+ }
131
+ }
132
+ ```
133
+
134
+ ## 常用命令
135
+
136
+ README 只保留快速入口;完整命令和浏览器操作 SOP 见 [skills/agent-browser-cli/SKILL.md](./skills/agent-browser-cli/SKILL.md)。
137
+
138
+ ```bash
139
+ agent-browser-cli tabs
140
+ ```
141
+
142
+ ## 更新
143
+
144
+ ```bash
145
+ git pull
146
+ cargo build --release
147
+ ./target/release/agent-browser-cli restart
148
+ ```
149
+
150
+ 如果 Chrome 扩展有更新,在 `chrome://extensions` 中重新加载 `assets/tmwd_cdp_bridge` 扩展。
151
+
152
+ 当前扩展配置标识为:
153
+
154
+ ```js
155
+ const TID = '__agent_browser_cli_bridge_26c9f1';
156
+ ```
157
+
158
+ 如果你把 skill 安装到了 Codex/Agent 的全局目录,更新后同步复制:
159
+
160
+ ```bash
161
+ mkdir -p ~/.agents/skills/agent-browser-cli
162
+ cp skills/agent-browser-cli/SKILL.md ~/.agents/skills/agent-browser-cli/SKILL.md
163
+ ```
164
+
165
+ ## 卸载
166
+
167
+ 先停止常驻服务:
168
+
169
+ ```bash
170
+ agent-browser-cli stop
171
+ ```
172
+
173
+ 然后按需清理:
174
+
175
+ ```bash
176
+ rm -f .agent-browser-cli.log .agent-browser-cli.lock
177
+ rm -rf ~/.agents/skills/agent-browser-cli
178
+ ```
179
+
180
+ 最后在 Chrome 扩展管理页中移除 `TMWD CDP Bridge` 扩展,或删除已加载的 `assets/tmwd_cdp_bridge` 扩展配置。
181
+
182
+ ## 端口
183
+
184
+ - `18765`:底层 `TMWebDriver` WebSocket,Chrome 扩展连接使用。
185
+ - `18766`:底层 `TMWebDriver` HTTP `/link`,用于内部 master/remote 协议。
186
+ - `18767`:外层 `agent-browser-cli` HTTP 服务,供 CLI 复用会话。
187
+
188
+ ## 友情链接
189
+
190
+ - [LINUX DO - 新的理想型社区](https://linux.do/)
191
+
192
+ ## 许可证
193
+
194
+ MIT License. See [LICENSE](./LICENSE).
package/README_EN.md ADDED
@@ -0,0 +1,169 @@
1
+ <div align="center">
2
+
3
+ # agent-browser-cli
4
+
5
+ A browser perception and control CLI for agents, turning a real Chrome session into reusable tab scanning, page JavaScript, Cookie, CDP, and screenshot capabilities.
6
+
7
+ Browser perception · Page control · Chrome session reuse · CDP · Conditional wait · Agent Skill integration
8
+
9
+ <p>
10
+ <a href="https://github.com/sleepinginsummer/agent-browser-cli"><img src="https://img.shields.io/badge/CLI-agentbrowsercli-2ea44f" alt="CLI agentbrowsercli"></a>
11
+ <a href="https://github.com/sleepinginsummer/agent-browser-cli/blob/main/LICENSE"><img src="https://img.shields.io/badge/License-MIT-green" alt="License MIT"></a>
12
+ <a href="https://www.python.org/"><img src="https://img.shields.io/badge/Python-%3E%3D3.10-3776AB?logo=python&logoColor=white" alt="Python >=3.10"></a>
13
+ <a href="https://github.com/sleepinginsummer/agent-browser-cli"><img src="https://img.shields.io/badge/Windows-MacOS-0078D6?labelColor=0078D6&color=C0C0C0" alt="Windows/MacOS"></a>
14
+ <a href="https://github.com/sleepinginsummer/agent-browser-cli/releases"><img src="https://img.shields.io/badge/release-v0.1.1-blue" alt="release v0.1.1"></a>
15
+ <a href="https://github.com/sleepinginsummer/agent-browser-cli/pulls"><img src="https://img.shields.io/badge/PRs-welcome-brightgreen" alt="PRs welcome"></a>
16
+ </p>
17
+
18
+ [AI One-Line Install](#ai-one-line-install) · [Manual Installation](#manual-installation) · [Chrome Extension](#chrome-extension) · [Update](#update) · [Uninstall](#uninstall) · [Friendly Links](#friendly-links)
19
+
20
+ [中文](README.md) | English
21
+
22
+ </div>
23
+
24
+ `agent-browser-cli` is a browser perception and control tool for agents. It connects to the user's real Chrome browser through a Chrome extension, preserving login state and cookies while providing tab scanning, page JavaScript execution, cookie reading, CDP control, screenshots, file uploads, dropdown clicks, and related capabilities.
25
+
26
+ This project is not Selenium or Playwright. It is better suited for helping agents read pages accurately and perform actions inside an existing browser session.
27
+
28
+ ## Project Info
29
+
30
+ - Current version: `0.1.1`
31
+ - Supported platforms: Windows, macOS
32
+ - Python: `3.10+` recommended
33
+ - Browser: Chrome / Chromium, with `assets/tmwd_cdp_bridge` loaded
34
+
35
+ ## Acknowledgements
36
+
37
+ The browser control capability in this project was extracted and adapted from the Web toolchain in [GenericAgent](https://github.com/lsdefine/GenericAgent), including ideas and implementation around `TMWebDriver`, `simphtml`, and the `tmwd_cdp_bridge` extension.
38
+
39
+ Thanks to the GenericAgent project for the browser bridge, page simplification, CDP control, and practical SOPs. This repository reorganizes and enhances that work for standalone usage and CLI invocation.
40
+
41
+ ## AI One-Line Install
42
+
43
+ ```text
44
+ Please read https://github.com/sleepinginsummer/agent-browser-cli/blob/main/AI_INSTALL.md, follow the instructions to install the CLI, load the Chrome extension, and add `skills/agent-browser-cli/SKILL.md`.
45
+ ```
46
+
47
+ ## Improvements
48
+
49
+ - Extracted browser control capability from GenericAgent and exposed it as a CLI for Codex, Claude Code, and OpenCode. The GenericAgent browser extension can be reused and does not need to be reinstalled.
50
+ - Avoids reinitializing the browser connection for every command.
51
+ - Adds a startup lock to avoid repeated low-level port binding when multiple CLI commands start concurrently.
52
+ - Adds the skill `skills/agent-browser-cli/SKILL.md` for AI usage reference.
53
+ - Includes several optimizations to reduce command execution time.
54
+
55
+ ## Layout
56
+
57
+ ```text
58
+ .
59
+ ├── agent_browser_cli.py # CLI entry
60
+ ├── agent_browser_server.py # Long-lived HTTP service
61
+ ├── ga.py # web_scan / web_execute_js entry
62
+ ├── TMWebDriver.py # Browser extension WebSocket / HTTP bridge
63
+ ├── simphtml.py # Page simplification and DOM diff
64
+ ├── assets/tmwd_cdp_bridge/ # Chrome MV3 extension
65
+ ├── memory/ # Browser tool SOPs
66
+ └── skills/agent-browser-cli/ # skill
67
+ ```
68
+
69
+ ## Manual Installation
70
+
71
+ ```bash
72
+ cd /path/to/agent-browser-cli
73
+ python3 -m venv .venv
74
+ .venv/bin/python -m pip install -r requirements.txt
75
+ ```
76
+
77
+ ## Chrome Extension
78
+
79
+ Load this extension directory:
80
+
81
+ ```text
82
+ assets/tmwd_cdp_bridge
83
+ ```
84
+
85
+ Chrome needs at least one normal web page tab open. Do not leave it only on `about:blank` or `chrome://` pages.
86
+
87
+ ## Quick Check
88
+
89
+ ```bash
90
+ .venv/bin/python agent_browser_cli.py tabs
91
+ .venv/bin/python agent_browser_cli.py open https://www.baidu.com
92
+ ```
93
+
94
+ On success, it returns:
95
+
96
+ ```json
97
+ {
98
+ "ok": true,
99
+ "result": {
100
+ "status": "success",
101
+ "metadata": {
102
+ "tabs_count": 1
103
+ }
104
+ }
105
+ }
106
+ ```
107
+
108
+ ## Common Commands
109
+
110
+ The README only keeps the quick entry point. For the full command list and browser operation SOP, see [skills/agent-browser-cli/SKILL.md](./skills/agent-browser-cli/SKILL.md).
111
+
112
+ ```bash
113
+ .venv/bin/python agent_browser_cli.py tabs
114
+ ```
115
+
116
+ ## Update
117
+
118
+ ```bash
119
+ git pull
120
+ .venv/bin/python -m pip install -r requirements.txt
121
+ .venv/bin/python agent_browser_cli.py restart
122
+ ```
123
+
124
+ If the Chrome extension has updates, reload the `assets/tmwd_cdp_bridge` extension in `chrome://extensions`.
125
+
126
+ Current extension bridge identifier:
127
+
128
+ ```js
129
+ const TID = '__agent_browser_cli_bridge_26c9f1';
130
+ ```
131
+
132
+ If you installed the skill into a global Codex/Agent directory, copy it again after updating:
133
+
134
+ ```bash
135
+ mkdir -p ~/.agents/skills/agent-browser-cli
136
+ cp skills/agent-browser-cli/SKILL.md ~/.agents/skills/agent-browser-cli/SKILL.md
137
+ ```
138
+
139
+ ## Uninstall
140
+
141
+ Stop the long-lived service first:
142
+
143
+ ```bash
144
+ .venv/bin/python agent_browser_cli.py stop
145
+ ```
146
+
147
+ Then clean up as needed:
148
+
149
+ ```bash
150
+ rm -rf .venv
151
+ rm -f .agent-browser-cli.log .agent-browser-cli.lock
152
+ rm -rf ~/.agents/skills/agent-browser-cli
153
+ ```
154
+
155
+ Finally, remove the `TMWD CDP Bridge` extension from Chrome's extension management page, or remove the loaded `assets/tmwd_cdp_bridge` extension configuration.
156
+
157
+ ## Ports
158
+
159
+ - `18765`: underlying `TMWebDriver` WebSocket, used by the Chrome extension.
160
+ - `18766`: underlying `TMWebDriver` HTTP `/link`, used by the internal master/remote protocol.
161
+ - `18767`: outer `agent-browser-cli` HTTP service, used by the CLI to reuse the session.
162
+
163
+ ## Friendly Links
164
+
165
+ - [LINUX DO - A New Ideal Community](https://linux.do/)
166
+
167
+ ## License
168
+
169
+ MIT License. See [LICENSE](./LICENSE).