dp-cli 0.1.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
dp_cli-0.1.0/PKG-INFO ADDED
@@ -0,0 +1,103 @@
1
+ Metadata-Version: 2.4
2
+ Name: dp-cli
3
+ Version: 0.1.0
4
+ Summary: A powerful CLI for DrissionPage — browser automation, structured data extraction, network listening and more.
5
+ License: BSD-3-Clause
6
+ Project-URL: Homepage, https://github.com/mofanx/dp-cli
7
+ Project-URL: Repository, https://github.com/mofanx/dp-cli
8
+ Keywords: drissionpage,browser,automation,cli,web-scraping
9
+ Classifier: Programming Language :: Python :: 3
10
+ Classifier: Development Status :: 3 - Alpha
11
+ Classifier: Environment :: Console
12
+ Classifier: Topic :: Utilities
13
+ Classifier: Topic :: Internet :: WWW/HTTP :: Browsers
14
+ Requires-Python: >=3.8
15
+ Description-Content-Type: text/markdown
16
+ Requires-Dist: DrissionPage>=4.0
17
+ Requires-Dist: click>=8.0
18
+
19
+ # dp-cli
20
+
21
+ A powerful CLI for [DrissionPage](https://github.com/g1879/DrissionPage) — browser automation, structured data extraction, network listening and more.
22
+
23
+ ## Features
24
+
25
+ - **Anti-detection by default** — not based on webdriver, `navigator.webdriver` is `false`
26
+ - **Reuse your own browser** — connect to a running Chrome via `--port`, keeping login state and cookies
27
+ - **Powerful locator syntax** — descriptive strings stable across navigation (no ephemeral refs)
28
+ - **Structured data extraction** — `extract` + `query` + `snapshot --mode content` for scraping list pages
29
+ - **Network listening** — capture XHR/Fetch requests and response bodies
30
+ - **Dual mode** — browser control + pure HTTP requests
31
+ - **Shadow-root / iframe** — traverse directly without switching context
32
+ - **JSON output** — all commands output JSON, AI-friendly
33
+
34
+ ## Installation
35
+
36
+ ```bash
37
+ pip install dp-cli
38
+ dp --help
39
+ ```
40
+
41
+ ## Quick Start
42
+
43
+ ```bash
44
+ # Auto-managed browser
45
+ dp open https://example.com
46
+ dp snapshot
47
+ dp click "text:Login"
48
+ dp fill "@name=username" admin
49
+ dp press Enter
50
+ dp close
51
+
52
+ # Connect to your own logged-in browser
53
+ google-chrome --remote-debugging-port=9222
54
+ dp open https://example.com --port 9222
55
+ dp snapshot
56
+ ```
57
+
58
+ ## Data Extraction (3-step workflow)
59
+
60
+ ```bash
61
+ # 1. Discover CSS class names via noise-filtered content tree
62
+ dp snapshot --mode content --max-text 40
63
+
64
+ # 2. Verify field selectors
65
+ dp query "css:.item-title" --fields "text,loc"
66
+
67
+ # 3. Batch extract to CSV
68
+ dp extract "css:.item-card" \
69
+ '{"title":"css:.item-title",
70
+ "price":"css:.item-price",
71
+ "tags":{"selector":"css:.tag","multi":true},
72
+ "url":{"selector":"css:a","attr":"href"}}' \
73
+ --limit 100 --output csv --filename result.csv
74
+ ```
75
+
76
+ ## Project Structure
77
+
78
+ ```
79
+ dp_cli/
80
+ ├── main.py # CLI entry point (~47 lines)
81
+ ├── session.py # Browser session management
82
+ ├── snapshot.py # Page snapshot & data extraction engine
83
+ ├── output.py # JSON output helpers
84
+ └── commands/
85
+ ├── _utils.py # Shared decorators & helpers
86
+ ├── browser.py # open / goto / reload / close / list
87
+ ├── snapshot_cmd.py # snapshot / extract / query / find / inspect
88
+ ├── element.py # click / fill / select / hover / drag / check / upload
89
+ ├── keyboard.py # press / type / scroll / scroll-to
90
+ ├── page.py # screenshot / pdf / eval / wait / dialog
91
+ ├── tab.py # tab-list / tab-new / tab-select / tab-close
92
+ ├── storage.py # cookie-* / localstorage-* / sessionstorage-*
93
+ ├── network.py # listen / listen-stop / http-get / http-post
94
+ └── misc.py # resize / maximize / state-save / state-load / config-set
95
+ ```
96
+
97
+ ## Documentation
98
+
99
+ See [`skills/SKILL.md`](skills/SKILL.md) for full workflow guide and [`skills/references/commands.md`](skills/references/commands.md) for complete command reference.
100
+
101
+ ## License
102
+
103
+ BSD-3-Clause
dp_cli-0.1.0/README.md ADDED
@@ -0,0 +1,85 @@
1
+ # dp-cli
2
+
3
+ A powerful CLI for [DrissionPage](https://github.com/g1879/DrissionPage) — browser automation, structured data extraction, network listening and more.
4
+
5
+ ## Features
6
+
7
+ - **Anti-detection by default** — not based on webdriver, `navigator.webdriver` is `false`
8
+ - **Reuse your own browser** — connect to a running Chrome via `--port`, keeping login state and cookies
9
+ - **Powerful locator syntax** — descriptive strings stable across navigation (no ephemeral refs)
10
+ - **Structured data extraction** — `extract` + `query` + `snapshot --mode content` for scraping list pages
11
+ - **Network listening** — capture XHR/Fetch requests and response bodies
12
+ - **Dual mode** — browser control + pure HTTP requests
13
+ - **Shadow-root / iframe** — traverse directly without switching context
14
+ - **JSON output** — all commands output JSON, AI-friendly
15
+
16
+ ## Installation
17
+
18
+ ```bash
19
+ pip install dp-cli
20
+ dp --help
21
+ ```
22
+
23
+ ## Quick Start
24
+
25
+ ```bash
26
+ # Auto-managed browser
27
+ dp open https://example.com
28
+ dp snapshot
29
+ dp click "text:Login"
30
+ dp fill "@name=username" admin
31
+ dp press Enter
32
+ dp close
33
+
34
+ # Connect to your own logged-in browser
35
+ google-chrome --remote-debugging-port=9222
36
+ dp open https://example.com --port 9222
37
+ dp snapshot
38
+ ```
39
+
40
+ ## Data Extraction (3-step workflow)
41
+
42
+ ```bash
43
+ # 1. Discover CSS class names via noise-filtered content tree
44
+ dp snapshot --mode content --max-text 40
45
+
46
+ # 2. Verify field selectors
47
+ dp query "css:.item-title" --fields "text,loc"
48
+
49
+ # 3. Batch extract to CSV
50
+ dp extract "css:.item-card" \
51
+ '{"title":"css:.item-title",
52
+ "price":"css:.item-price",
53
+ "tags":{"selector":"css:.tag","multi":true},
54
+ "url":{"selector":"css:a","attr":"href"}}' \
55
+ --limit 100 --output csv --filename result.csv
56
+ ```
57
+
58
+ ## Project Structure
59
+
60
+ ```
61
+ dp_cli/
62
+ ├── main.py # CLI entry point (~47 lines)
63
+ ├── session.py # Browser session management
64
+ ├── snapshot.py # Page snapshot & data extraction engine
65
+ ├── output.py # JSON output helpers
66
+ └── commands/
67
+ ├── _utils.py # Shared decorators & helpers
68
+ ├── browser.py # open / goto / reload / close / list
69
+ ├── snapshot_cmd.py # snapshot / extract / query / find / inspect
70
+ ├── element.py # click / fill / select / hover / drag / check / upload
71
+ ├── keyboard.py # press / type / scroll / scroll-to
72
+ ├── page.py # screenshot / pdf / eval / wait / dialog
73
+ ├── tab.py # tab-list / tab-new / tab-select / tab-close
74
+ ├── storage.py # cookie-* / localstorage-* / sessionstorage-*
75
+ ├── network.py # listen / listen-stop / http-get / http-post
76
+ └── misc.py # resize / maximize / state-save / state-load / config-set
77
+ ```
78
+
79
+ ## Documentation
80
+
81
+ See [`skills/SKILL.md`](skills/SKILL.md) for full workflow guide and [`skills/references/commands.md`](skills/references/commands.md) for complete command reference.
82
+
83
+ ## License
84
+
85
+ BSD-3-Clause
@@ -0,0 +1 @@
1
+ # -*- coding:utf-8 -*-
@@ -0,0 +1,12 @@
1
+ # -*- coding:utf-8 -*-
2
+ from dp_cli.commands import (
3
+ browser, snapshot_cmd, element, keyboard,
4
+ page, tab, storage, network, misc,
5
+ )
6
+
7
+ _MODULES = [browser, snapshot_cmd, element, keyboard, page, tab, storage, network, misc]
8
+
9
+
10
+ def register_all(cli):
11
+ for mod in _MODULES:
12
+ mod.register(cli)
@@ -0,0 +1,107 @@
1
+ # -*- coding:utf-8 -*-
2
+ """所有命令模块共享的工具函数和装饰器"""
3
+ import io
4
+ import csv
5
+ import click
6
+
7
+ from dp_cli.session import get_browser, load_refs, load_session, save_session
8
+ from dp_cli.output import error
9
+
10
+
11
+ def normalize_url(url: str) -> str:
12
+ """补全 URL scheme,支持省略 http:// / https://"""
13
+ if not url:
14
+ return url
15
+ if not url.startswith(('http://', 'https://', 'file://')):
16
+ return 'https://' + url
17
+ return url
18
+
19
+
20
+ def session_option(f):
21
+ return click.option('-s', '--session', default='default',
22
+ help='会话名称,默认 default', show_default=True)(f)
23
+
24
+
25
+ def _get_page(session: str, raw: bool = False):
26
+ """获取页面对象,失败则 error 退出。
27
+
28
+ :param raw: True 时始终返回 ChromiumPage(用于浏览器级操作如标签页管理)。
29
+ False 时返回绑定的标签页 ChromiumTab(如有),否则返回 ChromiumPage。
30
+ """
31
+ try:
32
+ page = get_browser(session)
33
+ except Exception as e:
34
+ error(f'无法连接浏览器会话 [{session}],请先执行 dp open',
35
+ code='SESSION_NOT_FOUND', detail=str(e))
36
+ return
37
+
38
+ if raw:
39
+ return page
40
+
41
+ # 检查是否有绑定的标签页
42
+ sess = load_session(session)
43
+ tab_id = sess.get('active_tab')
44
+ if tab_id:
45
+ try:
46
+ tab = page.get_tab(tab_id)
47
+ return tab
48
+ except Exception:
49
+ # 标签页可能已关闭,清除绑定
50
+ sess.pop('active_tab', None)
51
+ save_session(session, sess)
52
+
53
+ return page
54
+
55
+
56
+ def resolve_locator(locator: str, session: str = 'default') -> str:
57
+ """解析定位器,支持 ref:N 语法。
58
+
59
+ 如果 locator 以 'ref:' 开头,从 session 的 refs 映射中查找真实定位器。
60
+ 否则原样返回。
61
+ """
62
+ if not locator.startswith('ref:'):
63
+ return locator
64
+
65
+ ref_id = locator[4:]
66
+ refs = load_refs(session)
67
+ if not refs:
68
+ error(f'没有可用的 ref 映射,请先执行 dp snapshot',
69
+ code='NO_REFS')
70
+ raise SystemExit(1)
71
+
72
+ ref_data = refs.get(ref_id)
73
+ if not ref_data:
74
+ available = sorted(refs.keys(), key=lambda x: int(x) if x.isdigit() else 0)
75
+ hint = f"可用范围: ref:1 ~ ref:{available[-1]}" if available else ""
76
+ error(f'ref:{ref_id} 不存在。{hint}',
77
+ code='REF_NOT_FOUND')
78
+ raise SystemExit(1)
79
+
80
+ real_loc = ref_data.get('locator')
81
+ if real_loc and not real_loc.startswith('t:'):
82
+ return real_loc
83
+
84
+ # locator 不可用时(如 t:p),尝试用 name 作为 text 定位器
85
+ name = ref_data.get('name', '')
86
+ if name and len(name) <= 50:
87
+ return f'text:{name}'
88
+
89
+ error(f'ref:{ref_id} 无法解析为有效定位器 (role={ref_data.get("role")})',
90
+ code='REF_UNRESOLVABLE')
91
+ raise SystemExit(1)
92
+
93
+
94
+ def records_to_csv(records: list) -> str:
95
+ """将记录列表转为 CSV 字符串(含 BOM,Excel 直接打开不乱码)"""
96
+ if not records:
97
+ return ''
98
+ fields = list(records[0].keys())
99
+ buf = io.StringIO()
100
+ writer = csv.DictWriter(buf, fieldnames=fields, extrasaction='ignore',
101
+ lineterminator='\n')
102
+ writer.writeheader()
103
+ for row in records:
104
+ clean = {k: ('|'.join(str(i) for i in v) if isinstance(v, list) else v)
105
+ for k, v in row.items()}
106
+ writer.writerow(clean)
107
+ return buf.getvalue()
@@ -0,0 +1,159 @@
1
+ # -*- coding:utf-8 -*-
2
+ """浏览器生命周期命令: open / goto / reload / go-back / go-forward / close / close-all / list / delete-data"""
3
+ import click
4
+
5
+ from dp_cli.session import (get_browser, close_browser, list_sessions,
6
+ delete_session, load_session, save_session)
7
+ from dp_cli.output import ok, error, format_page_info
8
+ from dp_cli.commands._utils import session_option, _get_page, normalize_url
9
+
10
+
11
+ def register(cli):
12
+
13
+ @cli.command('open')
14
+ @click.argument('url', required=False)
15
+ @session_option
16
+ @click.option('--headless', is_flag=True, help='无头模式')
17
+ @click.option('--browser', 'browser_path', default=None, help='浏览器可执行文件路径')
18
+ @click.option('--profile', 'user_data_dir', default=None, help='用户数据目录')
19
+ @click.option('--proxy', default=None, help='代理服务器,如 http://127.0.0.1:7890')
20
+ @click.option('--port', type=int, default=None, help='连接指定端口的已有浏览器实例')
21
+ @click.option('--new', is_flag=True, help='强制创建新实例(不复用已有会话)')
22
+ def cmd_open(url, session, headless, browser_path, user_data_dir, proxy, port, new):
23
+ """打开浏览器并可选导航到 URL。
24
+
25
+ \b
26
+ 【复用用户自己的浏览器】(最常见场景,保留登录状态/Cookie/历史)
27
+ 第一步:用调试模式启动你自己的 Chrome/Chromium:
28
+ google-chrome --remote-debugging-port=9222
29
+ 第二步:用 dp 接管:
30
+ dp open --port 9222
31
+ dp open https://example.com --port 9222
32
+ 第三步:后续命令无需再指定 --port(会话自动记住端口):
33
+ dp snapshot
34
+ dp click "text:登录"
35
+
36
+ \b
37
+ 【dp 自动管理浏览器】
38
+ dp open
39
+ dp open https://example.com
40
+ dp open https://example.com --headless
41
+ dp -s work open https://github.com
42
+ """
43
+ if new:
44
+ delete_session(session)
45
+ try:
46
+ page = get_browser(session, headless=headless, browser_path=browser_path,
47
+ user_data_dir=user_data_dir, proxy=proxy, port=port)
48
+ except Exception as e:
49
+ error(f'启动浏览器失败: {e}', code='BROWSER_START_FAILED', detail=str(e))
50
+ return
51
+ if url:
52
+ try:
53
+ page.get(normalize_url(url))
54
+ except Exception as e:
55
+ error(f'导航失败: {e}', code='NAVIGATE_FAILED', detail=str(e))
56
+ return
57
+ ok(format_page_info(page), msg='浏览器已就绪')
58
+
59
+ @cli.command()
60
+ @click.argument('url')
61
+ @session_option
62
+ @click.option('--timeout', default=30, help='超时秒数', show_default=True)
63
+ @click.option('--retry', default=3, help='重试次数', show_default=True)
64
+ def goto(url, session, timeout, retry):
65
+ """导航到指定 URL。
66
+
67
+ \b
68
+ 示例:
69
+ dp goto https://example.com
70
+ dp goto example.com
71
+ dp goto example.com --timeout 60
72
+ """
73
+ page = _get_page(session)
74
+ try:
75
+ page.get(normalize_url(url), timeout=timeout, retry=retry)
76
+ ok(format_page_info(page))
77
+ except Exception as e:
78
+ error(f'导航到 {url} 失败', code='NAVIGATE_FAILED', detail=str(e))
79
+
80
+ @cli.command()
81
+ @session_option
82
+ def reload(session):
83
+ """刷新当前页面。"""
84
+ page = _get_page(session)
85
+ try:
86
+ page.get(page.url)
87
+ ok(format_page_info(page))
88
+ except Exception as e:
89
+ error(f'刷新失败', code='RELOAD_FAILED', detail=str(e))
90
+
91
+ @cli.command('go-back')
92
+ @session_option
93
+ def go_back(session):
94
+ """浏览器后退。"""
95
+ page = _get_page(session)
96
+ try:
97
+ page.back()
98
+ ok(format_page_info(page))
99
+ except Exception as e:
100
+ error('后退失败', code='NAVIGATE_FAILED', detail=str(e))
101
+
102
+ @cli.command('go-forward')
103
+ @session_option
104
+ def go_forward(session):
105
+ """浏览器前进。"""
106
+ page = _get_page(session)
107
+ try:
108
+ page.forward()
109
+ ok(format_page_info(page))
110
+ except Exception as e:
111
+ error('前进失败', code='NAVIGATE_FAILED', detail=str(e))
112
+
113
+ @cli.command('close')
114
+ @session_option
115
+ @click.option('--del-data', is_flag=True, help='同时删除用户数据目录')
116
+ @click.option('--force', is_flag=True, help='强制关闭浏览器(user_connected 模式下默认只断开连接)')
117
+ def cmd_close(session, del_data, force):
118
+ """关闭浏览器会话。
119
+
120
+ 如果是通过 --port 连接的用户自己的浏览器,默认只断开连接不关闭浏览器。
121
+ 用 --force 才会真正关闭浏览器进程。
122
+ """
123
+ sess = load_session(session)
124
+ if not sess:
125
+ error(f'会话 [{session}] 不存在', code='SESSION_NOT_FOUND')
126
+ return
127
+ user_connected = sess.get('user_connected', False)
128
+ if user_connected and not force:
129
+ delete_session(session)
130
+ ok(msg=f'已断开与 [{session}] 的连接(浏览器仍运行)。用 --force 关闭浏览器。')
131
+ else:
132
+ result = close_browser(session, del_data=del_data)
133
+ if result:
134
+ ok(msg=f'会话 [{session}] 已关闭')
135
+ else:
136
+ error(f'关闭失败', code='CLOSE_FAILED')
137
+
138
+ @cli.command('close-all')
139
+ def close_all():
140
+ """关闭所有会话。"""
141
+ sessions = list_sessions()
142
+ closed = []
143
+ for s in sessions:
144
+ close_browser(s['name'])
145
+ closed.append(s['name'])
146
+ ok({'closed': closed}, msg=f'已关闭 {len(closed)} 个会话')
147
+
148
+ @cli.command('list')
149
+ def cmd_list():
150
+ """列出所有活跃会话。"""
151
+ sessions = list_sessions()
152
+ ok({'sessions': sessions, 'count': len(sessions)})
153
+
154
+ @cli.command('delete-data')
155
+ @session_option
156
+ def delete_data(session):
157
+ """删除会话的用户数据目录。"""
158
+ close_browser(session, del_data=True)
159
+ ok(msg=f'会话 [{session}] 数据已删除')