@sleepinsummer/agent-browser-cli 0.2.0 → 0.2.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +10 -30
- package/README_EN.md +26 -21
- package/assets/simphtml_find_list.js +266 -0
- package/assets/simphtml_opt.js +324 -0
- package/package.json +6 -5
- package/skills/agent-browser-cli/SKILL.md +27 -60
- package/npm/platform/darwin-arm64/bin/agent-browser-cli +0 -0
- package/npm/platform/darwin-arm64/package.json +0 -14
package/README.md
CHANGED
|
@@ -9,8 +9,8 @@
|
|
|
9
9
|
<p>
|
|
10
10
|
<a href="https://github.com/sleepinginsummer/agent-browser-cli"><img src="https://img.shields.io/badge/CLI-agentbrowsercli-2ea44f" alt="CLI agentbrowsercli"></a>
|
|
11
11
|
<a href="https://github.com/sleepinginsummer/agent-browser-cli/blob/main/LICENSE"><img src="https://img.shields.io/badge/License-MIT-green" alt="License MIT"></a>
|
|
12
|
-
|
|
13
|
-
<a href="https://github.com/sleepinginsummer/agent-browser-cli/releases"><img src="https://img.shields.io/badge/release-v0.2.
|
|
12
|
+
<a href="https://github.com/sleepinginsummer/agent-browser-cli"><img src="https://img.shields.io/badge/Windows-MacOS-0078D6?labelColor=0078D6&color=C0C0C0" alt="Windows/MacOS"></a>
|
|
13
|
+
<a href="https://github.com/sleepinginsummer/agent-browser-cli/releases"><img src="https://img.shields.io/badge/release-v0.2.1-blue" alt="release v0.2.1"></a>
|
|
14
14
|
<a href="https://github.com/sleepinginsummer/agent-browser-cli/pulls"><img src="https://img.shields.io/badge/PRs-welcome-brightgreen" alt="PRs welcome"></a>
|
|
15
15
|
</p>
|
|
16
16
|
|
|
@@ -26,9 +26,8 @@
|
|
|
26
26
|
|
|
27
27
|
## 项目信息
|
|
28
28
|
|
|
29
|
-
- 当前版本:`0.2.
|
|
29
|
+
- 当前版本:`0.2.1`
|
|
30
30
|
- 支持平台:Windows、macOS
|
|
31
|
-
- 运行时:Rust 原生二进制;npm 安装时按平台选择二进制包
|
|
32
31
|
- 浏览器:Chrome / Chromium,需加载 `assets/tmwd_cdp_bridge`
|
|
33
32
|
|
|
34
33
|
## 致谢
|
|
@@ -50,19 +49,19 @@
|
|
|
50
49
|
- 新增启动锁,避免多个 CLI 并发启动时重复绑定底层端口。
|
|
51
50
|
- 增加skill:`skills/agent-browser-cli/SKILL.md`,提供ai参考使用。
|
|
52
51
|
- 若干优化,缩短命令执行时间
|
|
52
|
+
- rust实现cli端
|
|
53
53
|
|
|
54
54
|
## 目录结构
|
|
55
55
|
|
|
56
56
|
```text
|
|
57
57
|
.
|
|
58
|
-
├──
|
|
59
|
-
├──
|
|
60
|
-
├── ga.py # web_scan / web_execute_js 入口
|
|
61
|
-
├── TMWebDriver.py # 浏览器扩展 WebSocket / HTTP 桥
|
|
62
|
-
├── simphtml.py # 页面简化和 DOM diff
|
|
58
|
+
├── Cargo.toml # Rust 工程配置
|
|
59
|
+
├── src/ # Rust CLI / 常驻服务 / bridge
|
|
63
60
|
├── assets/tmwd_cdp_bridge/ # Chrome MV3 扩展
|
|
64
|
-
├──
|
|
65
|
-
|
|
61
|
+
├── assets/simphtml_opt.js # 页面简化脚本
|
|
62
|
+
├── assets/simphtml_find_list.js # 列表识别脚本
|
|
63
|
+
├── npm/ # npm 启动脚本
|
|
64
|
+
└── skills/agent-browser-cli/ # skill
|
|
66
65
|
```
|
|
67
66
|
|
|
68
67
|
## 手动安装
|
|
@@ -74,15 +73,6 @@ npm install -g @sleepinsummer/agent-browser-cli
|
|
|
74
73
|
agent-browser-cli tabs
|
|
75
74
|
```
|
|
76
75
|
|
|
77
|
-
当前 npm 分发采用主包 + 平台二进制包:
|
|
78
|
-
|
|
79
|
-
```text
|
|
80
|
-
@sleepinsummer/agent-browser-cli
|
|
81
|
-
@sleepinsummer/agent-browser-cli-darwin-arm64
|
|
82
|
-
@sleepinsummer/agent-browser-cli-darwin-x64
|
|
83
|
-
@sleepinsummer/agent-browser-cli-win32-x64
|
|
84
|
-
```
|
|
85
|
-
|
|
86
76
|
### 本地源码构建
|
|
87
77
|
|
|
88
78
|
```bash
|
|
@@ -90,15 +80,6 @@ cargo build --release
|
|
|
90
80
|
./target/release/agent-browser-cli tabs
|
|
91
81
|
```
|
|
92
82
|
|
|
93
|
-
### Python 旧版运行方式
|
|
94
|
-
|
|
95
|
-
Python 实现暂时保留为迁移参考和回退入口:
|
|
96
|
-
|
|
97
|
-
```bash
|
|
98
|
-
cd /path/to/agent-browser-cli
|
|
99
|
-
python3 -m venv .venv
|
|
100
|
-
.venv/bin/python -m pip install -r requirements.txt
|
|
101
|
-
```
|
|
102
83
|
|
|
103
84
|
## Chrome 扩展
|
|
104
85
|
|
|
@@ -182,7 +163,6 @@ rm -rf ~/.agents/skills/agent-browser-cli
|
|
|
182
163
|
## 端口
|
|
183
164
|
|
|
184
165
|
- `18765`:底层 `TMWebDriver` WebSocket,Chrome 扩展连接使用。
|
|
185
|
-
- `18766`:底层 `TMWebDriver` HTTP `/link`,用于内部 master/remote 协议。
|
|
186
166
|
- `18767`:外层 `agent-browser-cli` HTTP 服务,供 CLI 复用会话。
|
|
187
167
|
|
|
188
168
|
## 友情链接
|
package/README_EN.md
CHANGED
|
@@ -9,9 +9,8 @@ Browser perception · Page control · Chrome session reuse · CDP · Conditional
|
|
|
9
9
|
<p>
|
|
10
10
|
<a href="https://github.com/sleepinginsummer/agent-browser-cli"><img src="https://img.shields.io/badge/CLI-agentbrowsercli-2ea44f" alt="CLI agentbrowsercli"></a>
|
|
11
11
|
<a href="https://github.com/sleepinginsummer/agent-browser-cli/blob/main/LICENSE"><img src="https://img.shields.io/badge/License-MIT-green" alt="License MIT"></a>
|
|
12
|
-
<a href="https://www.python.org/"><img src="https://img.shields.io/badge/Python-%3E%3D3.10-3776AB?logo=python&logoColor=white" alt="Python >=3.10"></a>
|
|
13
12
|
<a href="https://github.com/sleepinginsummer/agent-browser-cli"><img src="https://img.shields.io/badge/Windows-MacOS-0078D6?labelColor=0078D6&color=C0C0C0" alt="Windows/MacOS"></a>
|
|
14
|
-
<a href="https://github.com/sleepinginsummer/agent-browser-cli/releases"><img src="https://img.shields.io/badge/release-v0.
|
|
13
|
+
<a href="https://github.com/sleepinginsummer/agent-browser-cli/releases"><img src="https://img.shields.io/badge/release-v0.2.1-blue" alt="release v0.2.1"></a>
|
|
15
14
|
<a href="https://github.com/sleepinginsummer/agent-browser-cli/pulls"><img src="https://img.shields.io/badge/PRs-welcome-brightgreen" alt="PRs welcome"></a>
|
|
16
15
|
</p>
|
|
17
16
|
|
|
@@ -27,9 +26,8 @@ This project is not Selenium or Playwright. It is better suited for helping agen
|
|
|
27
26
|
|
|
28
27
|
## Project Info
|
|
29
28
|
|
|
30
|
-
- Current version: `0.
|
|
29
|
+
- Current version: `0.2.1`
|
|
31
30
|
- Supported platforms: Windows, macOS
|
|
32
|
-
- Python: `3.10+` recommended
|
|
33
31
|
- Browser: Chrome / Chromium, with `assets/tmwd_cdp_bridge` loaded
|
|
34
32
|
|
|
35
33
|
## Acknowledgements
|
|
@@ -51,29 +49,38 @@ Please read https://github.com/sleepinginsummer/agent-browser-cli/blob/main/AI_I
|
|
|
51
49
|
- Adds a startup lock to avoid repeated low-level port binding when multiple CLI commands start concurrently.
|
|
52
50
|
- Adds the skill `skills/agent-browser-cli/SKILL.md` for AI usage reference.
|
|
53
51
|
- Includes several optimizations to reduce command execution time.
|
|
52
|
+
- Rust implementation for the CLI side.
|
|
54
53
|
|
|
55
54
|
## Layout
|
|
56
55
|
|
|
57
56
|
```text
|
|
58
57
|
.
|
|
59
|
-
├──
|
|
60
|
-
├──
|
|
61
|
-
├── ga.py # web_scan / web_execute_js entry
|
|
62
|
-
├── TMWebDriver.py # Browser extension WebSocket / HTTP bridge
|
|
63
|
-
├── simphtml.py # Page simplification and DOM diff
|
|
58
|
+
├── Cargo.toml # Rust crate config
|
|
59
|
+
├── src/ # Rust CLI / daemon / bridge
|
|
64
60
|
├── assets/tmwd_cdp_bridge/ # Chrome MV3 extension
|
|
65
|
-
├──
|
|
61
|
+
├── assets/simphtml_opt.js # Page simplification script
|
|
62
|
+
├── assets/simphtml_find_list.js # List detection script
|
|
63
|
+
├── npm/ # npm launcher scripts
|
|
66
64
|
└── skills/agent-browser-cli/ # skill
|
|
67
65
|
```
|
|
68
66
|
|
|
69
67
|
## Manual Installation
|
|
70
68
|
|
|
69
|
+
### npm
|
|
70
|
+
|
|
71
|
+
```bash
|
|
72
|
+
npm install -g @sleepinsummer/agent-browser-cli
|
|
73
|
+
agent-browser-cli tabs
|
|
74
|
+
```
|
|
75
|
+
|
|
76
|
+
### Build From Source
|
|
77
|
+
|
|
71
78
|
```bash
|
|
72
|
-
|
|
73
|
-
|
|
74
|
-
.venv/bin/python -m pip install -r requirements.txt
|
|
79
|
+
cargo build --release
|
|
80
|
+
./target/release/agent-browser-cli tabs
|
|
75
81
|
```
|
|
76
82
|
|
|
83
|
+
|
|
77
84
|
## Chrome Extension
|
|
78
85
|
|
|
79
86
|
Load this extension directory:
|
|
@@ -87,8 +94,8 @@ Chrome needs at least one normal web page tab open. Do not leave it only on `abo
|
|
|
87
94
|
## Quick Check
|
|
88
95
|
|
|
89
96
|
```bash
|
|
90
|
-
|
|
91
|
-
|
|
97
|
+
agent-browser-cli tabs
|
|
98
|
+
agent-browser-cli open https://www.baidu.com
|
|
92
99
|
```
|
|
93
100
|
|
|
94
101
|
On success, it returns:
|
|
@@ -110,15 +117,15 @@ On success, it returns:
|
|
|
110
117
|
The README only keeps the quick entry point. For the full command list and browser operation SOP, see [skills/agent-browser-cli/SKILL.md](./skills/agent-browser-cli/SKILL.md).
|
|
111
118
|
|
|
112
119
|
```bash
|
|
113
|
-
|
|
120
|
+
agent-browser-cli tabs
|
|
114
121
|
```
|
|
115
122
|
|
|
116
123
|
## Update
|
|
117
124
|
|
|
118
125
|
```bash
|
|
119
126
|
git pull
|
|
120
|
-
|
|
121
|
-
|
|
127
|
+
cargo build --release
|
|
128
|
+
./target/release/agent-browser-cli restart
|
|
122
129
|
```
|
|
123
130
|
|
|
124
131
|
If the Chrome extension has updates, reload the `assets/tmwd_cdp_bridge` extension in `chrome://extensions`.
|
|
@@ -141,13 +148,12 @@ cp skills/agent-browser-cli/SKILL.md ~/.agents/skills/agent-browser-cli/SKILL.md
|
|
|
141
148
|
Stop the long-lived service first:
|
|
142
149
|
|
|
143
150
|
```bash
|
|
144
|
-
|
|
151
|
+
agent-browser-cli stop
|
|
145
152
|
```
|
|
146
153
|
|
|
147
154
|
Then clean up as needed:
|
|
148
155
|
|
|
149
156
|
```bash
|
|
150
|
-
rm -rf .venv
|
|
151
157
|
rm -f .agent-browser-cli.log .agent-browser-cli.lock
|
|
152
158
|
rm -rf ~/.agents/skills/agent-browser-cli
|
|
153
159
|
```
|
|
@@ -157,7 +163,6 @@ Finally, remove the `TMWD CDP Bridge` extension from Chrome's extension manageme
|
|
|
157
163
|
## Ports
|
|
158
164
|
|
|
159
165
|
- `18765`: underlying `TMWebDriver` WebSocket, used by the Chrome extension.
|
|
160
|
-
- `18766`: underlying `TMWebDriver` HTTP `/link`, used by the internal master/remote protocol.
|
|
161
166
|
- `18767`: outer `agent-browser-cli` HTTP service, used by the CLI to reuse the session.
|
|
162
167
|
|
|
163
168
|
## Friendly Links
|
|
@@ -0,0 +1,266 @@
|
|
|
1
|
+
function findMainList(startElement = null) {
|
|
2
|
+
const root = startElement || document.body;
|
|
3
|
+
const MIN_CHILDREN = 8;
|
|
4
|
+
const MAX_CONTAINERS = 20;
|
|
5
|
+
|
|
6
|
+
// 全局扫描:收集候选容器,按 l1 + l2*0.1 排序(l2=孙子元素数,捕获表格等多层结构)
|
|
7
|
+
const candidates = [];
|
|
8
|
+
const allEls = root.querySelectorAll('*');
|
|
9
|
+
for (const node of allEls) {
|
|
10
|
+
if (node.closest('svg')) continue;
|
|
11
|
+
const l1 = node.children.length;
|
|
12
|
+
if (l1 < 5) continue;
|
|
13
|
+
let l2 = 0;
|
|
14
|
+
for (const child of node.children) l2 += child.children.length;
|
|
15
|
+
const score = l1 + l2 * 0.1;
|
|
16
|
+
if (score >= MIN_CHILDREN) candidates.push({node, score});
|
|
17
|
+
}
|
|
18
|
+
candidates.sort((a, b) => b.score - a.score);
|
|
19
|
+
const toProcess = candidates.slice(0, MAX_CONTAINERS).map(c => c.node);
|
|
20
|
+
|
|
21
|
+
// 对每个容器找候选组并评分
|
|
22
|
+
let allCandidates = [];
|
|
23
|
+
for (const container of toProcess) {
|
|
24
|
+
const topGroups = findTopGroups(container, 3);
|
|
25
|
+
for (const groupInfo of topGroups) {
|
|
26
|
+
const items = findMatchingElements(container, groupInfo.selector);
|
|
27
|
+
if (items.length >= 5) {
|
|
28
|
+
const score = scoreContainer(container, items) + groupInfo.score;
|
|
29
|
+
if (score >= 30) {
|
|
30
|
+
allCandidates.push({ container, selector: groupInfo.selector, items, score });
|
|
31
|
+
}
|
|
32
|
+
}
|
|
33
|
+
}
|
|
34
|
+
}
|
|
35
|
+
|
|
36
|
+
// 按分数降序排列
|
|
37
|
+
allCandidates.sort((a, b) => b.score - a.score);
|
|
38
|
+
|
|
39
|
+
// 去重:移除与更高分候选重叠超50%的结果
|
|
40
|
+
const kept = [];
|
|
41
|
+
for (const cand of allCandidates) {
|
|
42
|
+
let dominated = false;
|
|
43
|
+
for (const k of kept) {
|
|
44
|
+
if (k.container.contains(cand.container) || cand.container.contains(k.container)) {
|
|
45
|
+
const kSet = new Set(k.items);
|
|
46
|
+
const overlap = cand.items.filter(it => kSet.has(it)).length;
|
|
47
|
+
if (overlap > cand.items.length * 0.5) { dominated = true; break; }
|
|
48
|
+
}
|
|
49
|
+
}
|
|
50
|
+
if (!dominated) kept.push(cand);
|
|
51
|
+
}
|
|
52
|
+
|
|
53
|
+
function describeResult(container, items, selector, score) {
|
|
54
|
+
if(container&&!container.id)container.id='_ljq'+(window._lci=(window._lci||0)+1);
|
|
55
|
+
const cTag = container ? container.tagName : null;
|
|
56
|
+
const cId = container ? (container.id || '') : '';
|
|
57
|
+
const cClass = container ? (String(container.className || '').trim()) : '';
|
|
58
|
+
const result = {
|
|
59
|
+
containerTag: cTag, containerId: cId, containerClass: cClass,
|
|
60
|
+
itemCount: items.length,
|
|
61
|
+
};
|
|
62
|
+
let prefix = '';
|
|
63
|
+
if (cId) prefix = '#' + CSS.escape(cId);
|
|
64
|
+
if (selector) result.selector = prefix ? (prefix + ' > ' + selector) : selector;
|
|
65
|
+
if (score !== undefined) result.score = score;
|
|
66
|
+
if (items.length > 0) {
|
|
67
|
+
result.firstItemPreview = items[0].outerHTML.substring(0, 200);
|
|
68
|
+
result.itemTags = items.slice(0, 10).map(el => el.tagName + (el.className ? '.' + String(el.className).trim().split(/\s+/)[0] : ''));
|
|
69
|
+
}
|
|
70
|
+
return result;
|
|
71
|
+
}
|
|
72
|
+
|
|
73
|
+
if (kept.length === 0) return [];
|
|
74
|
+
|
|
75
|
+
return kept.map(c => describeResult(c.container, c.items, c.selector, c.score));
|
|
76
|
+
}
|
|
77
|
+
|
|
78
|
+
function findTopGroups(container, limit) {
|
|
79
|
+
const children = Array.from(container.children).filter(c => !c.closest('svg'));
|
|
80
|
+
const totalChildren = children.length;
|
|
81
|
+
if (totalChildren < 3) return [];
|
|
82
|
+
|
|
83
|
+
const minGroupSize = Math.max(3, Math.floor(totalChildren * 0.2));
|
|
84
|
+
const groups = [];
|
|
85
|
+
|
|
86
|
+
// 统计标签和类名
|
|
87
|
+
const tagFreq = {}, classFreq = {}, tagMap = {}, classMap = {};
|
|
88
|
+
|
|
89
|
+
children.forEach(child => {
|
|
90
|
+
// 统计标签
|
|
91
|
+
const tag = child.tagName.toLowerCase();
|
|
92
|
+
if (tag === "td") return;
|
|
93
|
+
tagFreq[tag] = (tagFreq[tag] || 0) + 1;
|
|
94
|
+
if (!tagMap[tag]) tagMap[tag] = [];
|
|
95
|
+
tagMap[tag].push(child);
|
|
96
|
+
|
|
97
|
+
// 统计类名
|
|
98
|
+
if (child.className) {
|
|
99
|
+
child.className.trim().split(/\s+/).forEach(cls => {
|
|
100
|
+
if (cls) {
|
|
101
|
+
classFreq[cls] = (classFreq[cls] || 0) + 1;
|
|
102
|
+
if (!classMap[cls]) classMap[cls] = [];
|
|
103
|
+
classMap[cls].push(child);
|
|
104
|
+
}
|
|
105
|
+
});
|
|
106
|
+
}
|
|
107
|
+
});
|
|
108
|
+
|
|
109
|
+
// 评分函数
|
|
110
|
+
const scoreGroup = (selector, elements) => {
|
|
111
|
+
const coverage = elements.length / totalChildren;
|
|
112
|
+
let specificity = selector.startsWith('.')
|
|
113
|
+
? (0.6 + (selector.match(/\./g).length - 1) * 0.1) // 类选择器
|
|
114
|
+
: (selector.includes('.')
|
|
115
|
+
? (0.7 + (selector.match(/\./g).length) * 0.1) // 标签+类
|
|
116
|
+
: 0.3); // 纯标签
|
|
117
|
+
return (coverage * 0.5) + (specificity * 0.5);
|
|
118
|
+
};
|
|
119
|
+
|
|
120
|
+
// 添加标签组
|
|
121
|
+
Object.keys(tagFreq).forEach(tag => {
|
|
122
|
+
if (tag !== "div" && tagFreq[tag] >= minGroupSize) {
|
|
123
|
+
groups.push({
|
|
124
|
+
selector: tag,
|
|
125
|
+
elements: tagMap[tag],
|
|
126
|
+
score: scoreGroup(tag, tagMap[tag]) - 0.5
|
|
127
|
+
});
|
|
128
|
+
}
|
|
129
|
+
});
|
|
130
|
+
|
|
131
|
+
// 添加类组
|
|
132
|
+
Object.keys(classFreq).forEach(cls => {
|
|
133
|
+
if (classFreq[cls] >= minGroupSize) {
|
|
134
|
+
const selector = '.' + CSS.escape(cls);
|
|
135
|
+
groups.push({
|
|
136
|
+
selector,
|
|
137
|
+
elements: classMap[cls],
|
|
138
|
+
score: scoreGroup(selector, classMap[cls])
|
|
139
|
+
});
|
|
140
|
+
}
|
|
141
|
+
});
|
|
142
|
+
// 添加标签+类组合
|
|
143
|
+
const topTags = Object.keys(tagFreq).filter(t => tagFreq[t] >= minGroupSize).slice(0, 3);
|
|
144
|
+
const topClasses = Object.keys(classFreq).filter(c => classFreq[c] >= minGroupSize).sort((a, b) => classFreq[b] - classFreq[a]).slice(0, 3);
|
|
145
|
+
|
|
146
|
+
// 标签+类
|
|
147
|
+
topTags.forEach(tag => {
|
|
148
|
+
topClasses.forEach(cls => {
|
|
149
|
+
const elements = children.filter(el =>
|
|
150
|
+
el.tagName.toLowerCase() === tag &&
|
|
151
|
+
el.className && el.className.split(/\s+/).includes(cls)
|
|
152
|
+
);
|
|
153
|
+
|
|
154
|
+
if (elements.length >= minGroupSize) {
|
|
155
|
+
const selector = tag + '.' + CSS.escape(cls);
|
|
156
|
+
groups.push({selector, elements, score: scoreGroup(selector, elements)});
|
|
157
|
+
}
|
|
158
|
+
});
|
|
159
|
+
});
|
|
160
|
+
|
|
161
|
+
// 多类组合
|
|
162
|
+
for (let i = 0; i < topClasses.length; i++) {
|
|
163
|
+
for (let j = i + 1; j < topClasses.length; j++) {
|
|
164
|
+
const elements = children.filter(el =>
|
|
165
|
+
el.className && el.className.split(/\s+/).includes(topClasses[i]) && el.className.split(/\s+/).includes(topClasses[j]));
|
|
166
|
+
|
|
167
|
+
if (elements.length >= minGroupSize) {
|
|
168
|
+
const selector = '.' + CSS.escape(topClasses[i]) + '.' + CSS.escape(topClasses[j]);
|
|
169
|
+
groups.push({selector, elements,score: scoreGroup(selector, elements)});
|
|
170
|
+
}
|
|
171
|
+
}
|
|
172
|
+
}
|
|
173
|
+
// 返回得分最高的N个组
|
|
174
|
+
return groups.sort((a, b) => b.score - a.score).slice(0, limit);
|
|
175
|
+
}
|
|
176
|
+
|
|
177
|
+
function findMatchingElements(container, selector) {
|
|
178
|
+
try {
|
|
179
|
+
return Array.from(container.querySelectorAll(selector));
|
|
180
|
+
} catch (e) {
|
|
181
|
+
// 处理无效选择器
|
|
182
|
+
console.error('Invalid selector:', selector, e);
|
|
183
|
+
return [];
|
|
184
|
+
}
|
|
185
|
+
}
|
|
186
|
+
|
|
187
|
+
function scoreContainer(container, items) {
|
|
188
|
+
if (!container || items.length < 3) return 0;
|
|
189
|
+
// 1. 计算基础面积数据
|
|
190
|
+
const containerRect = container.getBoundingClientRect();
|
|
191
|
+
const containerArea = containerRect.width * containerRect.height;
|
|
192
|
+
if (containerArea < 10000) return 0; // 容器太小
|
|
193
|
+
|
|
194
|
+
// 收集列表项面积数据
|
|
195
|
+
const itemAreas = [];
|
|
196
|
+
let totalItemArea = 0;
|
|
197
|
+
let visibleItems = 0;
|
|
198
|
+
|
|
199
|
+
items.forEach(item => {
|
|
200
|
+
const rect = item.getBoundingClientRect();
|
|
201
|
+
const area = rect.width * rect.height;
|
|
202
|
+
if (area > 0) {
|
|
203
|
+
totalItemArea += area;
|
|
204
|
+
itemAreas.push(area);
|
|
205
|
+
visibleItems++;
|
|
206
|
+
}
|
|
207
|
+
});
|
|
208
|
+
// 如果可见项太少,返回低分
|
|
209
|
+
if (visibleItems < 3) return 0;
|
|
210
|
+
// 防止异常值:确保面积不超过容器
|
|
211
|
+
totalItemArea = Math.min(totalItemArea, containerArea * 0.98);
|
|
212
|
+
const areaRatio = totalItemArea / containerArea;
|
|
213
|
+
// 3. 计算各项评分 - 使用线性插值而非阶梯
|
|
214
|
+
// 3.2 面积比评分 - 最多40分,连续曲线
|
|
215
|
+
// 使用sigmoid函数让评分更平滑
|
|
216
|
+
const areaScore = 40 / (1 + Math.exp(-12 * (areaRatio - 0.4)));
|
|
217
|
+
|
|
218
|
+
// 3.3 均匀性评分 - 最多20分,连续曲线
|
|
219
|
+
let uniformityScore = 0;
|
|
220
|
+
if (itemAreas.length >= 3) {
|
|
221
|
+
const mean = itemAreas.reduce((sum, area) => sum + area, 0) / itemAreas.length;
|
|
222
|
+
const variance = itemAreas.reduce((sum, area) => sum + Math.pow(area - mean, 2), 0) / itemAreas.length;
|
|
223
|
+
const cv = mean > 0 ? Math.sqrt(variance) / mean : 1;
|
|
224
|
+
// 指数衰减函数,cv越小分数越高
|
|
225
|
+
uniformityScore = 20 * Math.exp(-2.5 * cv);
|
|
226
|
+
}
|
|
227
|
+
|
|
228
|
+
const baseScore = Math.log2(visibleItems) * 5 + Math.floor(visibleItems / 5) * 0.25;
|
|
229
|
+
const rawCountScore = Math.min(40, baseScore);
|
|
230
|
+
const countScore = rawCountScore * Math.max(0.1, uniformityScore / 20);
|
|
231
|
+
|
|
232
|
+
// 3.4 容器尺寸评分 - 最多15分,连续曲线
|
|
233
|
+
const viewportArea = window.innerWidth * window.innerHeight;
|
|
234
|
+
const containerViewportRatio = containerArea / viewportArea;
|
|
235
|
+
const sizeScore = 2 * (1 - 1/(1 + Math.exp(-10 * (containerViewportRatio - 0.25))));
|
|
236
|
+
|
|
237
|
+
let layoutScore = 0;
|
|
238
|
+
if (items.length >= 3) {
|
|
239
|
+
// 坐标分组并计算行列数
|
|
240
|
+
const uniqueRows = new Set(items.map(item => Math.round(item.getBoundingClientRect().top / 5) * 5)).size;
|
|
241
|
+
const uniqueCols = new Set(items.map(item => Math.round(item.getBoundingClientRect().left / 5) * 5)).size;
|
|
242
|
+
// 如果是单行或单列,直接给满分;否则评估网格质量
|
|
243
|
+
if (uniqueRows === 1 || uniqueCols === 1) { layoutScore = 20;
|
|
244
|
+
} else {
|
|
245
|
+
const coverage = Math.min(1, items.length / (uniqueRows * uniqueCols));
|
|
246
|
+
const efficiency = Math.max(0, 1 - (uniqueRows + uniqueCols) / (2 * items.length));
|
|
247
|
+
layoutScore = 20 * (0.7 * coverage + 0.3 * efficiency);
|
|
248
|
+
}
|
|
249
|
+
}
|
|
250
|
+
|
|
251
|
+
// 总分 - 仍然保持100分左右的总分
|
|
252
|
+
const totalScore = countScore + areaScore + uniformityScore + layoutScore + sizeScore;
|
|
253
|
+
|
|
254
|
+
if (totalScore > 100)
|
|
255
|
+
console.log(container, {
|
|
256
|
+
total: totalScore.toFixed(2),
|
|
257
|
+
count: countScore.toFixed(2),
|
|
258
|
+
areaRatio: areaRatio.toFixed(2),
|
|
259
|
+
area: areaScore.toFixed(2),
|
|
260
|
+
uniformity: uniformityScore.toFixed(2),
|
|
261
|
+
size: sizeScore.toFixed(2),
|
|
262
|
+
layout: layoutScore.toFixed(2)
|
|
263
|
+
});
|
|
264
|
+
|
|
265
|
+
return totalScore;
|
|
266
|
+
}
|
|
@@ -0,0 +1,324 @@
|
|
|
1
|
+
function optHTML(text_only=false) {
|
|
2
|
+
function createEnhancedDOMCopy() {
|
|
3
|
+
const nodeInfo = new WeakMap();
|
|
4
|
+
const ignoreTags = ['SCRIPT', 'STYLE', 'NOSCRIPT', 'META', 'LINK', 'COLGROUP', 'COL', 'TEMPLATE', 'PARAM', 'SOURCE'];
|
|
5
|
+
const ignoreIds = ['agent-browser-cli-ind'];
|
|
6
|
+
function cloneNode(sourceNode, keep=false) {
|
|
7
|
+
if (sourceNode.nodeType === 8 ||
|
|
8
|
+
(sourceNode.nodeType === 1 && (
|
|
9
|
+
ignoreTags.includes(sourceNode.tagName) ||
|
|
10
|
+
(sourceNode.id && ignoreIds.includes(sourceNode.id))
|
|
11
|
+
))) {
|
|
12
|
+
return null;
|
|
13
|
+
}
|
|
14
|
+
if (sourceNode.nodeType === 3) return sourceNode.cloneNode(false);
|
|
15
|
+
const clone = sourceNode.cloneNode(false);
|
|
16
|
+
if ((sourceNode.tagName === 'INPUT' || sourceNode.tagName === 'TEXTAREA') && sourceNode.value) clone.setAttribute('value', sourceNode.value);
|
|
17
|
+
if (sourceNode.tagName === 'INPUT' && (sourceNode.type === 'radio' || sourceNode.type === 'checkbox') && sourceNode.checked) clone.setAttribute('checked', '');
|
|
18
|
+
else if (sourceNode.tagName === 'SELECT' && sourceNode.value) clone.setAttribute('data-selected', sourceNode.value);
|
|
19
|
+
try { if (sourceNode.matches && sourceNode.matches(':-webkit-autofill')) { clone.setAttribute('data-autofilled', 'true'); if (!sourceNode.value) clone.setAttribute('value', '⚠️受保护-读tmwebdriver_sop的autofill章节提取'); } } catch(e) {}
|
|
20
|
+
|
|
21
|
+
const isDropdown = sourceNode.classList?.contains('dropdown-menu') ||
|
|
22
|
+
/dropdown|menu/i.test(sourceNode.className) || sourceNode.getAttribute('role') === 'menu';
|
|
23
|
+
const _ddItems = isDropdown ? sourceNode.querySelectorAll('a, button, [role="menuitem"], li').length : 0;
|
|
24
|
+
const isSmallDropdown = _ddItems > 0 && _ddItems <= 7 && sourceNode.textContent.length < 500;
|
|
25
|
+
|
|
26
|
+
const childNodes = [];
|
|
27
|
+
for (const child of sourceNode.childNodes) {
|
|
28
|
+
const childClone = cloneNode(child, keep || isSmallDropdown);
|
|
29
|
+
if (childClone) childNodes.push(childClone);
|
|
30
|
+
}
|
|
31
|
+
if (sourceNode.tagName === 'IFRAME') {
|
|
32
|
+
try {
|
|
33
|
+
const iDoc = sourceNode.contentDocument || sourceNode.contentWindow?.document;
|
|
34
|
+
if (iDoc && iDoc.body && iDoc.body.children.length > 0) {
|
|
35
|
+
const wrapper = document.createElement('div');
|
|
36
|
+
wrapper.setAttribute('data-iframe-content', sourceNode.src || '');
|
|
37
|
+
for (const ch of iDoc.body.childNodes) {
|
|
38
|
+
const c = cloneNode(ch, keep);
|
|
39
|
+
if (c) wrapper.appendChild(c);
|
|
40
|
+
}
|
|
41
|
+
if (wrapper.childNodes.length) childNodes.push(wrapper);
|
|
42
|
+
}
|
|
43
|
+
} catch(e) {}
|
|
44
|
+
}
|
|
45
|
+
if (sourceNode.shadowRoot) {
|
|
46
|
+
for (const shadowChild of sourceNode.shadowRoot.childNodes) {
|
|
47
|
+
const shadowClone = cloneNode(shadowChild, keep);
|
|
48
|
+
if (shadowClone) childNodes.push(shadowClone);
|
|
49
|
+
}
|
|
50
|
+
}
|
|
51
|
+
|
|
52
|
+
const rect = sourceNode.getBoundingClientRect();
|
|
53
|
+
const style = window.getComputedStyle(sourceNode);
|
|
54
|
+
const area = (style.display === 'none' || style.visibility === 'hidden' || parseFloat(style.opacity) <= 0)?0:rect.width * rect.height;
|
|
55
|
+
const isVisible = (rect.width > 1 && rect.height > 1 &&
|
|
56
|
+
style.display !== 'none' && style.visibility !== 'hidden' &&
|
|
57
|
+
parseFloat(style.opacity) > 0 &&
|
|
58
|
+
Math.abs(rect.left) < 5000 && Math.abs(rect.top) < 5000)
|
|
59
|
+
|| isSmallDropdown;
|
|
60
|
+
const zIndex = style.position !== 'static' ? (parseInt(style.zIndex) || 0) : 0;
|
|
61
|
+
|
|
62
|
+
let info = {
|
|
63
|
+
rect, area, isVisible, isSmallDropdown, zIndex,
|
|
64
|
+
style: {
|
|
65
|
+
display: style.display, visibility: style.visibility,
|
|
66
|
+
opacity: style.opacity, position: style.position
|
|
67
|
+
}};
|
|
68
|
+
|
|
69
|
+
const nonTextChildren = childNodes.filter(child => child.nodeType !== 3);
|
|
70
|
+
const hasValidChildren = nonTextChildren.length > 0;
|
|
71
|
+
|
|
72
|
+
if (hasValidChildren) {
|
|
73
|
+
const childrenInfos = nonTextChildren.map(c => nodeInfo.get(c)).filter(i => i && i.rect && i.rect.width > 0 && i.rect.height > 0);
|
|
74
|
+
const bgAlpha = (() => {
|
|
75
|
+
const c = style.backgroundColor;
|
|
76
|
+
if (!c || c === 'transparent') return 0;
|
|
77
|
+
const m = c.match(/rgba?\([^)]+,\s*([\d.]+)\)/);
|
|
78
|
+
return m ? parseFloat(m[1]) : 1;
|
|
79
|
+
})();
|
|
80
|
+
const hasVisualBg = bgAlpha > 0.1 || style.backgroundImage !== 'none' || (style.backdropFilter && style.backdropFilter !== 'none') || style.boxShadow !== 'none';
|
|
81
|
+
|
|
82
|
+
if (!hasVisualBg && childrenInfos.length > 0) {
|
|
83
|
+
// Skip fixed/absolute children when computing parent's merged rect (they're out of flow)
|
|
84
|
+
const flowChildren = childrenInfos.filter(cInfo => cInfo.style && cInfo.style.position !== 'fixed' && cInfo.style.position !== 'absolute');
|
|
85
|
+
if (flowChildren.length > 0) {
|
|
86
|
+
let minL = Infinity, minT = Infinity, maxR = -Infinity, maxB = -Infinity;
|
|
87
|
+
for (const cInfo of flowChildren) {
|
|
88
|
+
minL = Math.min(minL, cInfo.rect.left);
|
|
89
|
+
minT = Math.min(minT, cInfo.rect.top);
|
|
90
|
+
maxR = Math.max(maxR, cInfo.rect.right);
|
|
91
|
+
maxB = Math.max(maxB, cInfo.rect.bottom);
|
|
92
|
+
}
|
|
93
|
+
info.rect = { left: minL, top: minT, right: maxR, bottom: maxB, width: maxR - minL, height: maxB - minT };
|
|
94
|
+
info.area = info.rect.width * info.rect.height;
|
|
95
|
+
} else {
|
|
96
|
+
const maxC = childrenInfos.filter(i => i.isVisible).sort((a, b) => b.area - a.area)[0];
|
|
97
|
+
if (maxC && maxC.area > 10000 && (!isVisible || maxC.area > info.area * 5)) info = maxC;
|
|
98
|
+
}
|
|
99
|
+
}
|
|
100
|
+
}
|
|
101
|
+
|
|
102
|
+
if (sourceNode.nodeType === 1 && sourceNode.tagName === 'DIV') {
|
|
103
|
+
if (!hasValidChildren && !sourceNode.textContent.trim()) return null;
|
|
104
|
+
}
|
|
105
|
+
// aria-hidden + not visible = truly hidden (e.g. mobile menus), remove even if has children
|
|
106
|
+
if (sourceNode.getAttribute && sourceNode.getAttribute('aria-hidden') === 'true' && !info.isVisible) {
|
|
107
|
+
return null;
|
|
108
|
+
}
|
|
109
|
+
if (info.isVisible || hasValidChildren || keep) {
|
|
110
|
+
childNodes.forEach(child => clone.appendChild(child));
|
|
111
|
+
return clone;
|
|
112
|
+
}
|
|
113
|
+
return null;
|
|
114
|
+
}
|
|
115
|
+
|
|
116
|
+
return {
|
|
117
|
+
domCopy: cloneNode(document.body),
|
|
118
|
+
getNodeInfo: node => nodeInfo.get(node),
|
|
119
|
+
isVisible: node => {
|
|
120
|
+
const info = nodeInfo.get(node);
|
|
121
|
+
return info && info.isVisible;
|
|
122
|
+
}
|
|
123
|
+
};
|
|
124
|
+
}
|
|
125
|
+
const { domCopy, getNodeInfo, isVisible } = createEnhancedDOMCopy();
|
|
126
|
+
if (text_only) {
|
|
127
|
+
const blocks = new Set(['DIV','P','H1','H2','H3','H4','H5','H6','LI','TR','SECTION','ARTICLE','HEADER','FOOTER','NAV','BLOCKQUOTE','PRE','HR','BR','DT','DD','FIGCAPTION','DETAILS','SUMMARY']);
|
|
128
|
+
domCopy.querySelectorAll('*').forEach(el => {
|
|
129
|
+
if (blocks.has(el.tagName)) el.insertAdjacentText('beforebegin', '\n');
|
|
130
|
+
});
|
|
131
|
+
domCopy.querySelectorAll('input:not([type=hidden]),textarea,select').forEach(el=>{
|
|
132
|
+
const p=[el.tagName,el.id&&'#'+el.id,el.getAttribute('name')&&'name='+el.getAttribute('name'),el.tagName==='INPUT'&&'type='+(el.getAttribute('type')||'text'),el.getAttribute('placeholder')&&'"'+el.getAttribute('placeholder')+'"',el.getAttribute('data-autofilled')&&'autofilled',el.disabled&&'disabled',el.tagName==='SELECT'&&el.getAttribute('data-selected')&&'="'+el.getAttribute('data-selected')+'"'].filter(Boolean).join(' ');
|
|
133
|
+
el.insertAdjacentText('beforebegin','\n['+p+']\n');
|
|
134
|
+
});
|
|
135
|
+
domCopy.querySelectorAll('button[disabled]').forEach(el=>el.insertAdjacentText('beforebegin','[DISABLED] '));
|
|
136
|
+
return domCopy.textContent;
|
|
137
|
+
}
|
|
138
|
+
const viewportArea = window.innerWidth * window.innerHeight;
|
|
139
|
+
|
|
140
|
+
function analyzeNode(node, pPathType='main') {
|
|
141
|
+
// 处理非元素节点和叶节点
|
|
142
|
+
if (node.nodeType !== 1 || !node.children.length) {
|
|
143
|
+
node.nodeType === 1 && (node.dataset.mark = 'K:leaf');
|
|
144
|
+
return;
|
|
145
|
+
}
|
|
146
|
+
const pathType = (node.dataset.mark === 'K:secondary') ? 'second' : pPathType;
|
|
147
|
+
const nodeInfoData = getNodeInfo(node);
|
|
148
|
+
if (!nodeInfoData || !nodeInfoData.rect) return;
|
|
149
|
+
const rectn = nodeInfoData.rect;
|
|
150
|
+
if (rectn.width < window.innerWidth * 0.8 && rectn.height < window.innerHeight * 0.8) return node;
|
|
151
|
+
if (node.tagName === 'TABLE') return;
|
|
152
|
+
const children = Array.from(node.children);
|
|
153
|
+
if (children.length === 1) {
|
|
154
|
+
node.dataset.mark = 'K:container';
|
|
155
|
+
return analyzeNode(children[0], pathType);
|
|
156
|
+
}
|
|
157
|
+
if (children.length > 10) return;
|
|
158
|
+
|
|
159
|
+
// 获取子元素信息并排序
|
|
160
|
+
const childrenInfo = children.map(child => {
|
|
161
|
+
const info = getNodeInfo(child) || { rect: {}, style: {} };
|
|
162
|
+
return { node: child, rect: info.rect, style: info.style,
|
|
163
|
+
area: info.area, zIndex: (info.zIndex || 0), isVisible: info.isVisible };
|
|
164
|
+
});
|
|
165
|
+
childrenInfo.sort((a, b) => b.area - a.area);
|
|
166
|
+
|
|
167
|
+
// 检测是划分还是覆盖
|
|
168
|
+
const isOverlay = hasOverlap(childrenInfo);
|
|
169
|
+
node.dataset.mark = isOverlay ? 'K:overlayParent' : 'K:partitionParent';
|
|
170
|
+
|
|
171
|
+
if (isOverlay) handleOverlayContainer(childrenInfo, pathType);
|
|
172
|
+
else handlePartitionContainer(childrenInfo, pathType);
|
|
173
|
+
|
|
174
|
+
console.log(`${isOverlay ? '覆盖' : '划分'}容器:`, node, `子元素数量: ${children.length}`);
|
|
175
|
+
console.log('子元素及标记:', children.map(child => ({
|
|
176
|
+
element: child,
|
|
177
|
+
mark: child.dataset.mark || '无',
|
|
178
|
+
info: getNodeInfo ? getNodeInfo(child) : undefined
|
|
179
|
+
})));
|
|
180
|
+
for (const child of children)
|
|
181
|
+
if (!child.dataset.mark || child.dataset.mark[0] !== 'R') analyzeNode(child, pathType);
|
|
182
|
+
}
|
|
183
|
+
|
|
184
|
+
// 处理划分容器
|
|
185
|
+
function handlePartitionContainer(childrenInfo, pathType) {
|
|
186
|
+
childrenInfo.sort((a, b) => b.area - a.area);
|
|
187
|
+
const totalArea = childrenInfo.reduce((sum, item) => sum + item.area, 0);
|
|
188
|
+
console.log(childrenInfo[0].area / totalArea);
|
|
189
|
+
const hasMainElement = childrenInfo.length >= 1 &&
|
|
190
|
+
(childrenInfo[0].area / totalArea > 0.5) &&
|
|
191
|
+
(childrenInfo.length === 1 || childrenInfo[0].area > childrenInfo[1].area * 2);
|
|
192
|
+
if (hasMainElement) {
|
|
193
|
+
childrenInfo[0].node.dataset.mark = 'K:main';
|
|
194
|
+
for (let i = 1; i < childrenInfo.length; i++) {
|
|
195
|
+
const child = childrenInfo[i];
|
|
196
|
+
let className = (child.node.getAttribute('class') || '').toLowerCase();
|
|
197
|
+
let isSecondary = containsButton(child.node);
|
|
198
|
+
if (className.includes('nav')) isSecondary = true;
|
|
199
|
+
if (className.includes('breadcrumbs')) isSecondary = true;
|
|
200
|
+
if (className.includes('header') && className.includes('table')) isSecondary = true;
|
|
201
|
+
if (child.node.innerHTML.trim().replace(/\s+/g, '').length < 500) isSecondary = true;
|
|
202
|
+
if (child.node.textContent.trim().length > 200) isSecondary = true; // P3: 有实质文本内容则保留
|
|
203
|
+
if (child.style.visibility === 'hidden') isSecondary = false;
|
|
204
|
+
if (isSecondary) child.node.dataset.mark = 'K:secondary';
|
|
205
|
+
else child.node.dataset.mark = 'K:nonEssential';
|
|
206
|
+
}
|
|
207
|
+
} else {
|
|
208
|
+
return; // relaxed: skip equalmany filtering, list truncation handles token budget
|
|
209
|
+
const uniqueClassNames = new Set(childrenInfo.map(item => item.node.getAttribute('class') || '')).size;
|
|
210
|
+
const highClassNameVariety = uniqueClassNames >= childrenInfo.length * 0.8;
|
|
211
|
+
if (pathType !== 'main' && highClassNameVariety && childrenInfo.length > 5) {
|
|
212
|
+
childrenInfo.forEach(child => child.node.dataset.mark = 'R:equalmany');
|
|
213
|
+
} else {
|
|
214
|
+
childrenInfo.forEach(child => child.node.dataset.mark = 'K:equal');
|
|
215
|
+
}
|
|
216
|
+
}
|
|
217
|
+
}
|
|
218
|
+
|
|
219
|
+
function containsButton(container) {
|
|
220
|
+
const hasStandardButton = container.querySelector('button, input[type="button"], input[type="submit"], [role="button"]') !== null;
|
|
221
|
+
if (hasStandardButton) return true;
|
|
222
|
+
const hasClassButton = container.querySelector('[class*="-btn"], [class*="-button"], .button, .btn, [class*="btn-"]') !== null;
|
|
223
|
+
return hasClassButton;
|
|
224
|
+
}
|
|
225
|
+
|
|
226
|
+
function handleOverlayContainer(childrenInfo, pathType) {
|
|
227
|
+
// elementFromPoint ground truth: 让浏览器告诉我们谁在视觉最上层
|
|
228
|
+
const _efp = document.elementFromPoint(window.innerWidth/2, window.innerHeight/2);
|
|
229
|
+
if (_efp) { let _el = _efp; while (_el) { const _h = childrenInfo.find(c => c.node.id && c.node.id === _el.id); if (_h) { _h.zIndex = 9999; break; } _el = _el.parentElement; } }
|
|
230
|
+
const sorted = [...childrenInfo].sort((a, b) => b.zIndex - a.zIndex);
|
|
231
|
+
console.log('排序后的子元素:', sorted);
|
|
232
|
+
if (sorted.length === 0) return;
|
|
233
|
+
|
|
234
|
+
const top = sorted[0];
|
|
235
|
+
const rect = top.rect;
|
|
236
|
+
const topNode = top.node;
|
|
237
|
+
const isComplex = top.node.querySelectorAll('input, select, textarea, button, a, [role="button"]').length >= 1;
|
|
238
|
+
|
|
239
|
+
const textContent = topNode.textContent?.trim() || '';
|
|
240
|
+
const textLength = textContent.length;
|
|
241
|
+
const hasLinks = topNode.querySelectorAll('a').length > 0;
|
|
242
|
+
const isMostlyText = textLength > 7 && !hasLinks;
|
|
243
|
+
|
|
244
|
+
const centerDiff = Math.abs((rect.left + rect.width/2) - window.innerWidth/2) / window.innerWidth;
|
|
245
|
+
const minDimensionRatio = Math.min(rect.width / window.innerWidth, rect.height / window.innerHeight);
|
|
246
|
+
const maxDimensionRatio = Math.max(rect.width / window.innerWidth, rect.height / window.innerHeight);
|
|
247
|
+
const isNearTop = rect.top < 50;
|
|
248
|
+
const isDialog = (top.node.querySelector('iframe') || top.node.querySelector('button') || top.node.querySelector('input')) && centerDiff < 0.3;
|
|
249
|
+
|
|
250
|
+
if (isComplex && centerDiff < 0.2 &&
|
|
251
|
+
((minDimensionRatio > 0.2 && rect.width/window.innerWidth < 0.98) || minDimensionRatio > 0.95)) {
|
|
252
|
+
top.node.dataset.mark = 'K:mainInteractive';
|
|
253
|
+
sorted.slice(1).forEach(e => {
|
|
254
|
+
if ((parseInt(e.zIndex)||0) <= (parseInt(sorted[0].zIndex)||0)) {
|
|
255
|
+
e.node.dataset.mark = 'R:covered';
|
|
256
|
+
} else {
|
|
257
|
+
e.node.dataset.mark = 'K:noncovered';
|
|
258
|
+
}
|
|
259
|
+
});
|
|
260
|
+
} else {
|
|
261
|
+
if (isComplex && isNearTop && maxDimensionRatio > 0.4 && top.isVisible) {
|
|
262
|
+
top.node.dataset.mark = 'K:topBar';
|
|
263
|
+
} else if (isMostlyText || isComplex || isDialog) {
|
|
264
|
+
topNode.dataset.mark = 'K:messageContent';
|
|
265
|
+
} else {
|
|
266
|
+
topNode.dataset.mark = 'R:floatingAd';
|
|
267
|
+
}
|
|
268
|
+
const rest = sorted.slice(1);
|
|
269
|
+
rest.length && (!hasOverlap(rest) ? handlePartitionContainer(rest, pathType) : handleOverlayContainer(rest, pathType));
|
|
270
|
+
}
|
|
271
|
+
}
|
|
272
|
+
|
|
273
|
+
function hasOverlap(items) {
|
|
274
|
+
return items.some((a, i) =>
|
|
275
|
+
items.slice(i+1).some(b => {
|
|
276
|
+
const r1 = a.rect, r2 = b.rect;
|
|
277
|
+
if (!r1.width || !r2.width || !r1.height || !r2.height) {return false;}
|
|
278
|
+
const epsilon = 1;
|
|
279
|
+
const x1 = r1.x !== undefined ? r1.x : r1.left;
|
|
280
|
+
const y1 = r1.y !== undefined ? r1.y : r1.top;
|
|
281
|
+
const x2 = r2.x !== undefined ? r2.x : r2.left;
|
|
282
|
+
const y2 = r2.y !== undefined ? r2.y : r2.top;
|
|
283
|
+
return !(x1 + r1.width <= x2 + epsilon || x1 >= x2 + r2.width - epsilon ||
|
|
284
|
+
y1 + r1.height <= y2 + epsilon || y1 >= y2 + r2.height - epsilon
|
|
285
|
+
);
|
|
286
|
+
})
|
|
287
|
+
);
|
|
288
|
+
}
|
|
289
|
+
|
|
290
|
+
// Hoist top 1-2 deep fixed dialogs to body level for overlay detection
|
|
291
|
+
const _fc = [...domCopy.querySelectorAll('*')].filter(el => {
|
|
292
|
+
if (el.parentNode === domCopy) return false;
|
|
293
|
+
const info = getNodeInfo(el);
|
|
294
|
+
if (!info?.rect || info.style.position !== 'fixed') return false;
|
|
295
|
+
const r = info.rect, cover = (r.width * r.height) / viewportArea;
|
|
296
|
+
const cd = Math.abs((r.left + r.width/2) - window.innerWidth/2) / window.innerWidth;
|
|
297
|
+
return cover > 0.15 && cover < 1.0 && cd < 0.3 && el.querySelector('button, input, a, [role="button"], iframe');
|
|
298
|
+
}).filter((el, _, arr) => !arr.some(o => o !== el && o.contains(el)))
|
|
299
|
+
.sort((a, b) => (getNodeInfo(b).rect.width * getNodeInfo(b).rect.height) - (getNodeInfo(a).rect.width * getNodeInfo(a).rect.height))
|
|
300
|
+
.slice(0, 2);
|
|
301
|
+
_fc.forEach(el => { const r = getNodeInfo(el).rect; console.log('[simphtml] Hoisted fixed dialog:', el.tagName + (el.id ? '#'+el.id : '') + (el.className ? '.'+String(el.className).split(' ')[0] : ''), Math.round(r.width)+'x'+Math.round(r.height), Math.round(100*r.width*r.height/viewportArea)+'%'); el.parentNode.removeChild(el); domCopy.appendChild(el); });
|
|
302
|
+
const result = analyzeNode(domCopy);
|
|
303
|
+
domCopy.querySelectorAll('[data-mark^="R:"]').forEach(el=>el.parentNode?.removeChild(el));
|
|
304
|
+
let root = domCopy;
|
|
305
|
+
while (root.children.length === 1) {
|
|
306
|
+
root = root.children[0];
|
|
307
|
+
}
|
|
308
|
+
for (let ii = 0; ii < 3; ii++) {
|
|
309
|
+
root.querySelectorAll('div').forEach(div => (!div.textContent.trim() && div.children.length === 0) && div.remove());
|
|
310
|
+
}
|
|
311
|
+
root.querySelectorAll('[data-mark]').forEach(e => e.removeAttribute('data-mark'));
|
|
312
|
+
root.removeAttribute('data-mark');
|
|
313
|
+
root.querySelectorAll('iframe').forEach(f => {
|
|
314
|
+
if (f.children.length) {
|
|
315
|
+
const d = document.createElement('div');
|
|
316
|
+
for (const a of f.attributes) d.setAttribute(a.name, a.value);
|
|
317
|
+
d.setAttribute('data-tag', 'iframe');
|
|
318
|
+
while (f.firstChild) d.appendChild(f.firstChild);
|
|
319
|
+
f.parentNode.replaceChild(d, f);
|
|
320
|
+
}
|
|
321
|
+
});
|
|
322
|
+
return root.outerHTML;
|
|
323
|
+
}
|
|
324
|
+
optHTML()
|
package/package.json
CHANGED
|
@@ -1,13 +1,14 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "@sleepinsummer/agent-browser-cli",
|
|
3
|
-
"version": "0.2.
|
|
3
|
+
"version": "0.2.1",
|
|
4
4
|
"description": "Agent-oriented browser sensing and control CLI backed by a native Rust daemon.",
|
|
5
5
|
"license": "MIT",
|
|
6
6
|
"bin": {
|
|
7
7
|
"agent-browser-cli": "npm/bin/agent-browser-cli.js"
|
|
8
8
|
},
|
|
9
9
|
"files": [
|
|
10
|
-
"npm",
|
|
10
|
+
"npm/bin",
|
|
11
|
+
"npm/postinstall.js",
|
|
11
12
|
"assets",
|
|
12
13
|
"skills",
|
|
13
14
|
"README.md",
|
|
@@ -20,9 +21,9 @@
|
|
|
20
21
|
"postinstall": "node npm/postinstall.js"
|
|
21
22
|
},
|
|
22
23
|
"optionalDependencies": {
|
|
23
|
-
"@sleepinsummer/agent-browser-cli-darwin-arm64": "0.2.
|
|
24
|
-
"@sleepinsummer/agent-browser-cli-darwin-x64": "0.2.
|
|
25
|
-
"@sleepinsummer/agent-browser-cli-win32-x64": "0.2.
|
|
24
|
+
"@sleepinsummer/agent-browser-cli-darwin-arm64": "0.2.1",
|
|
25
|
+
"@sleepinsummer/agent-browser-cli-darwin-x64": "0.2.1",
|
|
26
|
+
"@sleepinsummer/agent-browser-cli-win32-x64": "0.2.1"
|
|
26
27
|
},
|
|
27
28
|
"engines": {
|
|
28
29
|
"node": ">=18"
|
|
@@ -5,7 +5,7 @@ description: 使用 agent-browser-cli 进行浏览器感知与控制。适用于
|
|
|
5
5
|
|
|
6
6
|
# agent-browser-cli
|
|
7
7
|
|
|
8
|
-
使用 `agent-browser-cli` 进行浏览器控制。底层通过 Rust 常驻服务和 Chrome 扩展接管用户浏览器,保留登录态和 Cookie;不是 Selenium/Playwright
|
|
8
|
+
使用 `agent-browser-cli` 进行浏览器控制。底层通过 Rust 常驻服务和 Chrome 扩展接管用户浏览器,保留登录态和 Cookie;不是 Selenium/Playwright。
|
|
9
9
|
|
|
10
10
|
## 项目路径
|
|
11
11
|
|
|
@@ -73,18 +73,8 @@ agent-browser-cli restart
|
|
|
73
73
|
|
|
74
74
|
常驻服务端口:
|
|
75
75
|
- `18765`:底层 `TMWebDriver` WebSocket,Chrome 扩展连接使用。
|
|
76
|
-
- `18766`:底层 `TMWebDriver` HTTP `/link`,用于内部 master/remote 协议。
|
|
77
76
|
- `18767`:外层 `agent-browser-cli` HTTP 服务,供 CLI 复用会话。
|
|
78
77
|
|
|
79
|
-
旧 Python 实现只用于回退排障或开发:
|
|
80
|
-
|
|
81
|
-
```bash
|
|
82
|
-
.venv/bin/python - <<'PY'
|
|
83
|
-
import ga
|
|
84
|
-
print(ga.web_scan(tabs_only=True))
|
|
85
|
-
PY
|
|
86
|
-
```
|
|
87
|
-
|
|
88
78
|
成功标志:
|
|
89
79
|
- 返回 `status=success`
|
|
90
80
|
- 能看到 `tabs_count`
|
|
@@ -127,38 +117,29 @@ agent-browser-cli exec --tab 303987837 'document.querySelector("button").click()
|
|
|
127
117
|
|
|
128
118
|
## 基础调用
|
|
129
119
|
|
|
130
|
-
`
|
|
131
|
-
|
|
132
|
-
```python
|
|
133
|
-
import ga
|
|
120
|
+
`scan` 负责感知,`exec` 负责精确操作。能精确操作时,不做全量扫描。
|
|
134
121
|
|
|
135
|
-
|
|
136
|
-
|
|
137
|
-
|
|
138
|
-
|
|
122
|
+
```bash
|
|
123
|
+
agent-browser-cli scan --tabs-only
|
|
124
|
+
agent-browser-cli scan
|
|
125
|
+
agent-browser-cli scan --text-only
|
|
126
|
+
agent-browser-cli scan --tab 303987837
|
|
139
127
|
```
|
|
140
128
|
|
|
141
129
|
普通页面 JS:
|
|
142
130
|
|
|
143
|
-
```
|
|
144
|
-
|
|
145
|
-
|
|
146
|
-
print(ga.web_execute_js("return document.title"))
|
|
147
|
-
print(ga.web_execute_js("""
|
|
148
|
-
return {
|
|
149
|
-
title: document.title,
|
|
150
|
-
url: location.href
|
|
151
|
-
}
|
|
152
|
-
"""))
|
|
131
|
+
```bash
|
|
132
|
+
agent-browser-cli exec 'return document.title'
|
|
133
|
+
agent-browser-cli exec 'return { title: document.title, url: location.href }'
|
|
153
134
|
```
|
|
154
135
|
|
|
155
|
-
`
|
|
136
|
+
`exec` 内使用 `await` 时必须显式 `return`,否则结果可能是 `null`。
|
|
156
137
|
|
|
157
|
-
`
|
|
138
|
+
`scan` 只读取当前页,不负责导航。切换网站用 `open` 或 `exec` 执行:
|
|
158
139
|
|
|
159
|
-
```
|
|
160
|
-
|
|
161
|
-
|
|
140
|
+
```bash
|
|
141
|
+
agent-browser-cli open https://example.com
|
|
142
|
+
agent-browser-cli exec "location.href='https://example.com'; return location.href"
|
|
162
143
|
```
|
|
163
144
|
|
|
164
145
|
新开标签页优先使用原生 `open` 命令,不要用 `window.open` 加 `--monitor`。`open` 底层走扩展 `chrome.tabs.create`,不会触发 CDP debugger attach。
|
|
@@ -174,13 +155,11 @@ JS 事件的 `isTrusted=false`,敏感操作可能被页面拦截。JS 点击
|
|
|
174
155
|
|
|
175
156
|
跨标签页、Cookie、CDP、扩展管理、浏览器内容权限时,优先用 JSON 字符串直传,不要自己拼 DOM 节点。
|
|
176
157
|
|
|
177
|
-
```
|
|
178
|
-
|
|
179
|
-
|
|
180
|
-
|
|
181
|
-
|
|
182
|
-
print(ga.web_execute_js('{"cmd":"cdp","tabId":303987837,"method":"Page.captureScreenshot","params":{"format":"png"}}'))
|
|
183
|
-
print(ga.web_execute_js('{"cmd":"batch","tabId":303987837,"commands":[{"cmd":"tabs"},{"cmd":"cookies"}]}'))
|
|
158
|
+
```bash
|
|
159
|
+
agent-browser-cli exec '{"cmd":"tabs"}'
|
|
160
|
+
agent-browser-cli exec '{"cmd":"cookies"}'
|
|
161
|
+
agent-browser-cli exec '{"cmd":"cdp","tabId":303987837,"method":"Page.captureScreenshot","params":{"format":"png"}}'
|
|
162
|
+
agent-browser-cli exec '{"cmd":"batch","tabId":303987837,"commands":[{"cmd":"tabs"},{"cmd":"cookies"}]}'
|
|
184
163
|
```
|
|
185
164
|
|
|
186
165
|
常用命令:
|
|
@@ -263,11 +242,11 @@ return fetch('PDF_URL').then(r => r.blob()).then(b => {
|
|
|
263
242
|
});
|
|
264
243
|
```
|
|
265
244
|
|
|
266
|
-
Google 图搜场景不要硬编码混淆 class。点击结果优先找 `[role=button]` 容器;`
|
|
245
|
+
Google 图搜场景不要硬编码混淆 class。点击结果优先找 `[role=button]` 容器;`scan` 可能过滤边栏,弹出后用 JS 读 `document.body.innerText`;大图遍历 `img` 按 `naturalWidth` 最大取 `src`;“访问”链接遍历 `a` 找 `textContent.includes('访问')` 的 `href`;缩略图直接提取 `img[src^="data:image"]`。
|
|
267
246
|
|
|
268
247
|
## iframe、Shadow DOM 与截图
|
|
269
248
|
|
|
270
|
-
同源 iframe 会被 `
|
|
249
|
+
同源 iframe 会被 `scan` 自动穿透。跨域 iframe 优先走 CDP:`Page.getFrameTree` 找 `frameId`,再 `Page.createIsolatedWorld` 获取 `contextId`,最后用 `Runtime.evaluate` 在 iframe 上下文执行。
|
|
271
250
|
|
|
272
251
|
iframe 内元素做 CDP 点击时,坐标需要合成:`finalX = iframeRect.x + elRect.x`,`finalY = iframeRect.y + elRect.y`。`Target.getTargets` / `Target.attachToTarget` 在当前 CDP 桥里通常会返回 `Not allowed`,不要优先走这条路。postMessage 中继只在 content script 已注入 iframe 时可靠,第三方支付 iframe 通常不可用。
|
|
273
252
|
|
|
@@ -277,33 +256,21 @@ closed Shadow DOM 使用 `DOM.getDocument({depth:-1,pierce:true})`,再逐级 `
|
|
|
277
256
|
|
|
278
257
|
截图优先 CDP:
|
|
279
258
|
|
|
280
|
-
```
|
|
281
|
-
|
|
282
|
-
print(ga.web_execute_js('{"cmd":"cdp","method":"Page.captureScreenshot","params":{"format":"png"}}'))
|
|
259
|
+
```bash
|
|
260
|
+
agent-browser-cli exec '{"cmd":"cdp","method":"Page.captureScreenshot","params":{"format":"png"}}'
|
|
283
261
|
```
|
|
284
262
|
|
|
285
263
|
验证码 canvas/img 优先用 JS `canvas.toDataURL()` 或直接读取图片 `src`。
|
|
286
264
|
|
|
287
265
|
## Autofill 与登录
|
|
288
266
|
|
|
289
|
-
`
|
|
267
|
+
`scan` 输出的 input 若带 `data-autofilled="true"`,value 可能显示为受保护提示,不是真实值。Chrome 只在前台 tab 释放 autofill 保护值,所以必须先 CDP `Page.bringToFront`。
|
|
290
268
|
|
|
291
269
|
一键释放流程:`Page.bringToFront` -> `mousePressed` 点任一字段,通常不需要 `mouseReleased` -> 等 500ms -> 补发 `input/change` 事件 -> 点登录。
|
|
292
270
|
|
|
293
271
|
## 调试
|
|
294
272
|
|
|
295
|
-
|
|
296
|
-
|
|
297
|
-
```python
|
|
298
|
-
from TMWebDriver import TMWebDriver
|
|
299
|
-
import simphtml
|
|
300
|
-
|
|
301
|
-
d = TMWebDriver()
|
|
302
|
-
d.set_session('url_pattern')
|
|
303
|
-
print(d.execute_js('return document.title'))
|
|
304
|
-
```
|
|
305
|
-
|
|
306
|
-
`simphtml.optimize_html_for_tokens(html)` 返回 BeautifulSoup Tag,展示前用 `str(...)`。
|
|
273
|
+
页面简化调试必须注入 JS 到真实浏览器,本地静态解析无法模拟 DOM。优先用 `scan --text-only` 和小段 `exec` 缩小问题范围。
|
|
307
274
|
|
|
308
275
|
## 排障顺序
|
|
309
276
|
|
|
@@ -312,4 +279,4 @@ print(d.execute_js('return document.title'))
|
|
|
312
279
|
3. 若提示无法加载 `config.js` 或清单,检查 `assets/tmwd_cdp_bridge/config.js`。
|
|
313
280
|
4. 若提示没有可用标签页,先打开正常网页,不要只开内部页。
|
|
314
281
|
5. 若扩展没装,加载 `assets/tmwd_cdp_bridge`。
|
|
315
|
-
6.
|
|
282
|
+
6. 仍失败时检查 Chrome 扩展后台日志和 `.agent-browser-cli.log`。
|
|
Binary file
|