dp-cli 0.1.1__tar.gz → 0.2.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (43) hide show
  1. dp_cli-0.2.0/PKG-INFO +186 -0
  2. dp_cli-0.2.0/README.md +165 -0
  3. dp_cli-0.2.0/dp_cli/bridge.py +500 -0
  4. dp_cli-0.2.0/dp_cli/bridge_manager.py +219 -0
  5. dp_cli-0.2.0/dp_cli/commands/_utils.py +197 -0
  6. dp_cli-0.2.0/dp_cli/commands/browser.py +316 -0
  7. {dp_cli-0.1.1 → dp_cli-0.2.0}/dp_cli/commands/element.py +35 -1
  8. dp_cli-0.2.0/dp_cli/commands/keyboard.py +225 -0
  9. {dp_cli-0.1.1 → dp_cli-0.2.0}/dp_cli/commands/page.py +36 -12
  10. {dp_cli-0.1.1 → dp_cli-0.2.0}/dp_cli/commands/snapshot_cmd.py +2 -2
  11. {dp_cli-0.1.1 → dp_cli-0.2.0}/dp_cli/commands/tab.py +1 -1
  12. dp_cli-0.2.0/dp_cli/session.py +414 -0
  13. dp_cli-0.2.0/dp_cli/stealth.py +368 -0
  14. dp_cli-0.2.0/dp_cli.egg-info/PKG-INFO +186 -0
  15. {dp_cli-0.1.1 → dp_cli-0.2.0}/dp_cli.egg-info/SOURCES.txt +6 -1
  16. dp_cli-0.2.0/dp_cli.egg-info/requires.txt +5 -0
  17. {dp_cli-0.1.1 → dp_cli-0.2.0}/pyproject.toml +4 -1
  18. dp_cli-0.2.0/tests/test_bridge_integration.py +210 -0
  19. dp_cli-0.2.0/tests/test_bridge_manager.py +166 -0
  20. dp_cli-0.1.1/PKG-INFO +0 -103
  21. dp_cli-0.1.1/README.md +0 -85
  22. dp_cli-0.1.1/dp_cli/commands/_utils.py +0 -107
  23. dp_cli-0.1.1/dp_cli/commands/browser.py +0 -159
  24. dp_cli-0.1.1/dp_cli/commands/keyboard.py +0 -126
  25. dp_cli-0.1.1/dp_cli/session.py +0 -218
  26. dp_cli-0.1.1/dp_cli.egg-info/PKG-INFO +0 -103
  27. dp_cli-0.1.1/dp_cli.egg-info/requires.txt +0 -2
  28. {dp_cli-0.1.1 → dp_cli-0.2.0}/dp_cli/__init__.py +0 -0
  29. {dp_cli-0.1.1 → dp_cli-0.2.0}/dp_cli/commands/__init__.py +0 -0
  30. {dp_cli-0.1.1 → dp_cli-0.2.0}/dp_cli/commands/misc.py +0 -0
  31. {dp_cli-0.1.1 → dp_cli-0.2.0}/dp_cli/commands/network.py +0 -0
  32. {dp_cli-0.1.1 → dp_cli-0.2.0}/dp_cli/commands/storage.py +0 -0
  33. {dp_cli-0.1.1 → dp_cli-0.2.0}/dp_cli/main.py +0 -0
  34. {dp_cli-0.1.1 → dp_cli-0.2.0}/dp_cli/output.py +0 -0
  35. {dp_cli-0.1.1 → dp_cli-0.2.0}/dp_cli/snapshot/__init__.py +0 -0
  36. {dp_cli-0.1.1 → dp_cli-0.2.0}/dp_cli/snapshot/a11y.py +0 -0
  37. {dp_cli-0.1.1 → dp_cli-0.2.0}/dp_cli/snapshot/extract.py +0 -0
  38. {dp_cli-0.1.1 → dp_cli-0.2.0}/dp_cli/snapshot/js_scripts.py +0 -0
  39. {dp_cli-0.1.1 → dp_cli-0.2.0}/dp_cli/snapshot/utils.py +0 -0
  40. {dp_cli-0.1.1 → dp_cli-0.2.0}/dp_cli.egg-info/dependency_links.txt +0 -0
  41. {dp_cli-0.1.1 → dp_cli-0.2.0}/dp_cli.egg-info/entry_points.txt +0 -0
  42. {dp_cli-0.1.1 → dp_cli-0.2.0}/dp_cli.egg-info/top_level.txt +0 -0
  43. {dp_cli-0.1.1 → dp_cli-0.2.0}/setup.cfg +0 -0
dp_cli-0.2.0/PKG-INFO ADDED
@@ -0,0 +1,186 @@
1
+ Metadata-Version: 2.4
2
+ Name: dp-cli
3
+ Version: 0.2.0
4
+ Summary: A powerful CLI for DrissionPage — browser automation, structured data extraction, network listening and more.
5
+ License: BSD-3-Clause
6
+ Project-URL: Homepage, https://github.com/mofanx/dp-cli
7
+ Project-URL: Repository, https://github.com/mofanx/dp-cli
8
+ Keywords: drissionpage,browser,automation,cli,web-scraping
9
+ Classifier: Programming Language :: Python :: 3
10
+ Classifier: Development Status :: 3 - Alpha
11
+ Classifier: Environment :: Console
12
+ Classifier: Topic :: Utilities
13
+ Classifier: Topic :: Internet :: WWW/HTTP :: Browsers
14
+ Requires-Python: >=3.8
15
+ Description-Content-Type: text/markdown
16
+ Requires-Dist: DrissionPage>=4.0
17
+ Requires-Dist: click>=8.0
18
+ Requires-Dist: aiohttp>=3.9
19
+ Requires-Dist: websockets>=12
20
+ Requires-Dist: requests>=2.28
21
+
22
+ # dp-cli
23
+
24
+ A powerful CLI for [DrissionPage](https://github.com/g1879/DrissionPage) — browser automation, structured data extraction, network listening and more.
25
+
26
+ ## Features
27
+
28
+ - **Anti-detection by default** — not based on webdriver, `navigator.webdriver` is `false`
29
+ - **Reuse your own browser** — connect to a running Chrome via `--port`, keeping login state and cookies
30
+ - **Powerful locator syntax** — descriptive strings stable across navigation (no ephemeral refs)
31
+ - **Structured data extraction** — `extract` + `query` + `snapshot --mode content` for scraping list pages
32
+ - **Network listening** — capture XHR/Fetch requests and response bodies
33
+ - **Dual mode** — browser control + pure HTTP requests
34
+ - **Shadow-root / iframe** — traverse directly without switching context
35
+ - **JSON output** — all commands output JSON, AI-friendly
36
+
37
+ ## Installation
38
+
39
+ ```bash
40
+ pip install dp-cli
41
+ dp --help
42
+ ```
43
+
44
+ ## Quick Start
45
+
46
+ ```bash
47
+ # Auto-managed browser
48
+ dp open https://example.com
49
+ dp snapshot
50
+ dp click "text:Login"
51
+ dp fill "@name=username" admin
52
+ dp press Enter
53
+ dp close
54
+
55
+ # Connect to your own logged-in browser
56
+ google-chrome --remote-debugging-port=9222
57
+ dp open https://example.com --port 9222
58
+ dp snapshot
59
+ ```
60
+
61
+ ## Connect to a Normally-Launched Chrome (Chrome 144+)
62
+
63
+ No `--remote-debugging-port` required. Chrome 144+ exposes opt-in remote debugging
64
+ via `chrome://inspect`:
65
+
66
+ 1. Open your Chrome as usual (no special flags)
67
+ 2. Visit `chrome://inspect/#remote-debugging`
68
+ 3. Check **"Allow remote debugging for this browser instance"**
69
+ 4. Run `dp open --auto-connect`
70
+
71
+ ```bash
72
+ dp open --auto-connect # stable channel, default profile
73
+ dp open --auto-connect --channel beta # pick a different channel
74
+ dp open --auto-connect --probe-dir ~/my-profile # custom user-data-dir
75
+ ```
76
+
77
+ ### How it works
78
+
79
+ Chrome 144+ in this mode exposes **only** a browser-level WebSocket and omits the HTTP
80
+ REST API (`/json`, `/json/version`, ...) that DrissionPage / puppeteer / Playwright
81
+ depend on. `dp-cli` transparently handles this:
82
+
83
+ 1. Reads `DevToolsActivePort` from the user-data-dir → real CDP port
84
+ 2. Probes the port — if `/json/version` is missing, identifies this as inspect mode
85
+ 3. Spawns a local bridge (`python -m dp_cli.bridge`) that:
86
+ - Synthesizes the missing HTTP endpoints from CDP calls
87
+ - Multiplexes page-level CDP traffic over a single browser-level WebSocket
88
+ via `Target.attachToTarget(flatten=True)`
89
+ 4. Points DrissionPage at the bridge. Subsequent `dp` commands reuse the same bridge.
90
+
91
+ The bridge subprocess and its port are tracked in the session file; `dp close` stops
92
+ the bridge automatically and never quits your Chrome (it's your browser, not dp's).
93
+
94
+ ### Caveats
95
+
96
+ - Chrome always shows an **"Allow remote debugging"** dialog per new WebSocket client.
97
+ Since bridge maintains one WebSocket and dp commands share it, you confirm at most
98
+ once per `dp open --auto-connect`.
99
+ - Works with whatever profile Chrome is actually using — same cookies, logins, history.
100
+ - Classic `--remote-debugging-port=9222` mode still works unchanged via `dp open --port 9222`.
101
+
102
+ ## Anti-Detection (stealth)
103
+
104
+ Bypass `navigator.webdriver`, `HeadlessChrome` UA, empty `plugins`, SwiftShader WebGL,
105
+ `chrome.runtime` missing, and other common automation fingerprints.
106
+
107
+ ```bash
108
+ # One-shot: connect + apply full stealth patches
109
+ dp open --port 9322 --stealth
110
+ dp goto https://bot.sannysoft.com/
111
+
112
+ # Or apply manually on an existing session (full preset by default)
113
+ dp stealth
114
+ dp stealth --preset mild # webdriver + UA only
115
+ dp stealth --ua "Mozilla/5.0 ..." # custom UA
116
+ dp stealth --feature webdriver --feature webgl # fine-grained
117
+ ```
118
+
119
+ ### Recommended VPS Chrome flags (when connecting via SSH tunnel)
120
+
121
+ ```bash
122
+ google-chrome --headless=new --remote-debugging-port=9222 \
123
+ --no-sandbox --disable-dev-shm-usage \
124
+ --disable-blink-features=AutomationControlled \
125
+ --user-data-dir=~/.config/google-chrome
126
+ # Then on local:
127
+ ssh -NL 9322:127.0.0.1:9222 vps
128
+ dp open --port 9322 --stealth
129
+ ```
130
+
131
+ Patched features (full preset): `webdriver`, `UA`, `chrome.runtime`, `permissions`,
132
+ `plugins`, `languages`, `WebGL VENDOR/RENDERER`, `window.outerWidth/Height`.
133
+
134
+ Patches are injected via `Page.addScriptToEvaluateOnNewDocument` — they persist across
135
+ navigations and frames. Advanced fingerprints (Canvas/Audio/font list) require a real
136
+ GPU or Xvfb environment.
137
+
138
+ ## Data Extraction (3-step workflow)
139
+
140
+ ```bash
141
+ # 1. Discover CSS class names via noise-filtered content tree
142
+ dp snapshot --mode content --max-text 40
143
+
144
+ # 2. Verify field selectors
145
+ dp query "css:.item-title" --fields "text,loc"
146
+
147
+ # 3. Batch extract to CSV
148
+ dp extract "css:.item-card" \
149
+ '{"title":"css:.item-title",
150
+ "price":"css:.item-price",
151
+ "tags":{"selector":"css:.tag","multi":true},
152
+ "url":{"selector":"css:a","attr":"href"}}' \
153
+ --limit 100 --output csv --filename result.csv
154
+ ```
155
+
156
+ ## Project Structure
157
+
158
+ ```
159
+ dp_cli/
160
+ ├── main.py # CLI entry point (~47 lines)
161
+ ├── session.py # Browser session management + auto-connect bridge glue
162
+ ├── bridge.py # chrome://inspect mode CDP bridge (python -m dp_cli.bridge)
163
+ ├── bridge_manager.py # Bridge subprocess lifecycle + inspect-mode detection
164
+ ├── stealth.py # Anti-detection JS patches (applied via CDP)
165
+ ├── snapshot/ # a11y-tree snapshot & data extraction engine
166
+ ├── output.py # JSON output helpers
167
+ └── commands/
168
+ ├── _utils.py # Shared decorators & helpers
169
+ ├── browser.py # open / goto / reload / close / list / stealth
170
+ ├── snapshot_cmd.py # snapshot / extract / query / find / inspect
171
+ ├── element.py # click / fill / select / hover / drag / check / upload / count
172
+ ├── keyboard.py # press / type / scroll / scroll-to / autoscroll
173
+ ├── page.py # screenshot / pdf / eval / wait (idle/loaded/url/title) / dialog
174
+ ├── tab.py # tab-list / tab-new / tab-select / tab-close
175
+ ├── storage.py # cookie-* / localstorage-* / sessionstorage-*
176
+ ├── network.py # listen / listen-stop / http-get / http-post
177
+ └── misc.py # resize / maximize / state-save / state-load / config-set
178
+ ```
179
+
180
+ ## Documentation
181
+
182
+ See [`skills/SKILL.md`](skills/SKILL.md) for full workflow guide and [`skills/references/commands.md`](skills/references/commands.md) for complete command reference.
183
+
184
+ ## License
185
+
186
+ BSD-3-Clause
dp_cli-0.2.0/README.md ADDED
@@ -0,0 +1,165 @@
1
+ # dp-cli
2
+
3
+ A powerful CLI for [DrissionPage](https://github.com/g1879/DrissionPage) — browser automation, structured data extraction, network listening and more.
4
+
5
+ ## Features
6
+
7
+ - **Anti-detection by default** — not based on webdriver, `navigator.webdriver` is `false`
8
+ - **Reuse your own browser** — connect to a running Chrome via `--port`, keeping login state and cookies
9
+ - **Powerful locator syntax** — descriptive strings stable across navigation (no ephemeral refs)
10
+ - **Structured data extraction** — `extract` + `query` + `snapshot --mode content` for scraping list pages
11
+ - **Network listening** — capture XHR/Fetch requests and response bodies
12
+ - **Dual mode** — browser control + pure HTTP requests
13
+ - **Shadow-root / iframe** — traverse directly without switching context
14
+ - **JSON output** — all commands output JSON, AI-friendly
15
+
16
+ ## Installation
17
+
18
+ ```bash
19
+ pip install dp-cli
20
+ dp --help
21
+ ```
22
+
23
+ ## Quick Start
24
+
25
+ ```bash
26
+ # Auto-managed browser
27
+ dp open https://example.com
28
+ dp snapshot
29
+ dp click "text:Login"
30
+ dp fill "@name=username" admin
31
+ dp press Enter
32
+ dp close
33
+
34
+ # Connect to your own logged-in browser
35
+ google-chrome --remote-debugging-port=9222
36
+ dp open https://example.com --port 9222
37
+ dp snapshot
38
+ ```
39
+
40
+ ## Connect to a Normally-Launched Chrome (Chrome 144+)
41
+
42
+ No `--remote-debugging-port` required. Chrome 144+ exposes opt-in remote debugging
43
+ via `chrome://inspect`:
44
+
45
+ 1. Open your Chrome as usual (no special flags)
46
+ 2. Visit `chrome://inspect/#remote-debugging`
47
+ 3. Check **"Allow remote debugging for this browser instance"**
48
+ 4. Run `dp open --auto-connect`
49
+
50
+ ```bash
51
+ dp open --auto-connect # stable channel, default profile
52
+ dp open --auto-connect --channel beta # pick a different channel
53
+ dp open --auto-connect --probe-dir ~/my-profile # custom user-data-dir
54
+ ```
55
+
56
+ ### How it works
57
+
58
+ Chrome 144+ in this mode exposes **only** a browser-level WebSocket and omits the HTTP
59
+ REST API (`/json`, `/json/version`, ...) that DrissionPage / puppeteer / Playwright
60
+ depend on. `dp-cli` transparently handles this:
61
+
62
+ 1. Reads `DevToolsActivePort` from the user-data-dir → real CDP port
63
+ 2. Probes the port — if `/json/version` is missing, identifies this as inspect mode
64
+ 3. Spawns a local bridge (`python -m dp_cli.bridge`) that:
65
+ - Synthesizes the missing HTTP endpoints from CDP calls
66
+ - Multiplexes page-level CDP traffic over a single browser-level WebSocket
67
+ via `Target.attachToTarget(flatten=True)`
68
+ 4. Points DrissionPage at the bridge. Subsequent `dp` commands reuse the same bridge.
69
+
70
+ The bridge subprocess and its port are tracked in the session file; `dp close` stops
71
+ the bridge automatically and never quits your Chrome (it's your browser, not dp's).
72
+
73
+ ### Caveats
74
+
75
+ - Chrome always shows an **"Allow remote debugging"** dialog per new WebSocket client.
76
+ Since bridge maintains one WebSocket and dp commands share it, you confirm at most
77
+ once per `dp open --auto-connect`.
78
+ - Works with whatever profile Chrome is actually using — same cookies, logins, history.
79
+ - Classic `--remote-debugging-port=9222` mode still works unchanged via `dp open --port 9222`.
80
+
81
+ ## Anti-Detection (stealth)
82
+
83
+ Bypass `navigator.webdriver`, `HeadlessChrome` UA, empty `plugins`, SwiftShader WebGL,
84
+ `chrome.runtime` missing, and other common automation fingerprints.
85
+
86
+ ```bash
87
+ # One-shot: connect + apply full stealth patches
88
+ dp open --port 9322 --stealth
89
+ dp goto https://bot.sannysoft.com/
90
+
91
+ # Or apply manually on an existing session (full preset by default)
92
+ dp stealth
93
+ dp stealth --preset mild # webdriver + UA only
94
+ dp stealth --ua "Mozilla/5.0 ..." # custom UA
95
+ dp stealth --feature webdriver --feature webgl # fine-grained
96
+ ```
97
+
98
+ ### Recommended VPS Chrome flags (when connecting via SSH tunnel)
99
+
100
+ ```bash
101
+ google-chrome --headless=new --remote-debugging-port=9222 \
102
+ --no-sandbox --disable-dev-shm-usage \
103
+ --disable-blink-features=AutomationControlled \
104
+ --user-data-dir=~/.config/google-chrome
105
+ # Then on local:
106
+ ssh -NL 9322:127.0.0.1:9222 vps
107
+ dp open --port 9322 --stealth
108
+ ```
109
+
110
+ Patched features (full preset): `webdriver`, `UA`, `chrome.runtime`, `permissions`,
111
+ `plugins`, `languages`, `WebGL VENDOR/RENDERER`, `window.outerWidth/Height`.
112
+
113
+ Patches are injected via `Page.addScriptToEvaluateOnNewDocument` — they persist across
114
+ navigations and frames. Advanced fingerprints (Canvas/Audio/font list) require a real
115
+ GPU or Xvfb environment.
116
+
117
+ ## Data Extraction (3-step workflow)
118
+
119
+ ```bash
120
+ # 1. Discover CSS class names via noise-filtered content tree
121
+ dp snapshot --mode content --max-text 40
122
+
123
+ # 2. Verify field selectors
124
+ dp query "css:.item-title" --fields "text,loc"
125
+
126
+ # 3. Batch extract to CSV
127
+ dp extract "css:.item-card" \
128
+ '{"title":"css:.item-title",
129
+ "price":"css:.item-price",
130
+ "tags":{"selector":"css:.tag","multi":true},
131
+ "url":{"selector":"css:a","attr":"href"}}' \
132
+ --limit 100 --output csv --filename result.csv
133
+ ```
134
+
135
+ ## Project Structure
136
+
137
+ ```
138
+ dp_cli/
139
+ ├── main.py # CLI entry point (~47 lines)
140
+ ├── session.py # Browser session management + auto-connect bridge glue
141
+ ├── bridge.py # chrome://inspect mode CDP bridge (python -m dp_cli.bridge)
142
+ ├── bridge_manager.py # Bridge subprocess lifecycle + inspect-mode detection
143
+ ├── stealth.py # Anti-detection JS patches (applied via CDP)
144
+ ├── snapshot/ # a11y-tree snapshot & data extraction engine
145
+ ├── output.py # JSON output helpers
146
+ └── commands/
147
+ ├── _utils.py # Shared decorators & helpers
148
+ ├── browser.py # open / goto / reload / close / list / stealth
149
+ ├── snapshot_cmd.py # snapshot / extract / query / find / inspect
150
+ ├── element.py # click / fill / select / hover / drag / check / upload / count
151
+ ├── keyboard.py # press / type / scroll / scroll-to / autoscroll
152
+ ├── page.py # screenshot / pdf / eval / wait (idle/loaded/url/title) / dialog
153
+ ├── tab.py # tab-list / tab-new / tab-select / tab-close
154
+ ├── storage.py # cookie-* / localstorage-* / sessionstorage-*
155
+ ├── network.py # listen / listen-stop / http-get / http-post
156
+ └── misc.py # resize / maximize / state-save / state-load / config-set
157
+ ```
158
+
159
+ ## Documentation
160
+
161
+ See [`skills/SKILL.md`](skills/SKILL.md) for full workflow guide and [`skills/references/commands.md`](skills/references/commands.md) for complete command reference.
162
+
163
+ ## License
164
+
165
+ BSD-3-Clause