pydoll-python 1.3.3__tar.gz → 1.5.0__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- pydoll_python-1.5.0/PKG-INFO +535 -0
- pydoll_python-1.5.0/README.md +515 -0
- pydoll_python-1.5.0/pydoll/browser/__init__.py +4 -0
- {pydoll_python-1.3.3 → pydoll_python-1.5.0}/pydoll/browser/base.py +28 -20
- {pydoll_python-1.3.3 → pydoll_python-1.5.0}/pydoll/browser/chrome.py +4 -5
- pydoll_python-1.5.0/pydoll/browser/constants.py +6 -0
- pydoll_python-1.5.0/pydoll/browser/edge.py +74 -0
- {pydoll_python-1.3.3 → pydoll_python-1.5.0}/pydoll/browser/managers.py +127 -36
- {pydoll_python-1.3.3 → pydoll_python-1.5.0}/pydoll/browser/options.py +23 -2
- {pydoll_python-1.3.3 → pydoll_python-1.5.0}/pydoll/browser/page.py +94 -1
- {pydoll_python-1.3.3 → pydoll_python-1.5.0}/pydoll/commands/dom.py +71 -1
- {pydoll_python-1.3.3 → pydoll_python-1.5.0}/pydoll/commands/input.py +89 -0
- {pydoll_python-1.3.3 → pydoll_python-1.5.0}/pydoll/commands/page.py +23 -0
- pydoll_python-1.5.0/pydoll/common/__init__.py +1 -0
- pydoll_python-1.5.0/pydoll/common/keyboard.py +101 -0
- pydoll_python-1.5.0/pydoll/common/keys.py +52 -0
- {pydoll_python-1.3.3 → pydoll_python-1.5.0}/pydoll/connection/connection.py +8 -3
- {pydoll_python-1.3.3 → pydoll_python-1.5.0}/pydoll/element.py +96 -6
- {pydoll_python-1.3.3 → pydoll_python-1.5.0}/pyproject.toml +1 -1
- pydoll_python-1.3.3/PKG-INFO +0 -874
- pydoll_python-1.3.3/README.md +0 -854
- pydoll_python-1.3.3/pydoll/mixins/__init__.py +0 -0
- {pydoll_python-1.3.3 → pydoll_python-1.5.0}/LICENSE +0 -0
- {pydoll_python-1.3.3 → pydoll_python-1.5.0}/pydoll/__init__.py +0 -0
- {pydoll_python-1.3.3 → pydoll_python-1.5.0}/pydoll/commands/__init__.py +0 -0
- {pydoll_python-1.3.3 → pydoll_python-1.5.0}/pydoll/commands/browser.py +0 -0
- {pydoll_python-1.3.3 → pydoll_python-1.5.0}/pydoll/commands/fetch.py +0 -0
- {pydoll_python-1.3.3 → pydoll_python-1.5.0}/pydoll/commands/network.py +0 -0
- {pydoll_python-1.3.3 → pydoll_python-1.5.0}/pydoll/commands/runtime.py +0 -0
- {pydoll_python-1.3.3 → pydoll_python-1.5.0}/pydoll/commands/storage.py +0 -0
- {pydoll_python-1.3.3 → pydoll_python-1.5.0}/pydoll/commands/target.py +0 -0
- {pydoll_python-1.3.3/pydoll/browser → pydoll_python-1.5.0/pydoll/connection}/__init__.py +0 -0
- {pydoll_python-1.3.3 → pydoll_python-1.5.0}/pydoll/connection/managers.py +0 -0
- {pydoll_python-1.3.3 → pydoll_python-1.5.0}/pydoll/constants.py +0 -0
- {pydoll_python-1.3.3 → pydoll_python-1.5.0}/pydoll/events/__init__.py +0 -0
- {pydoll_python-1.3.3 → pydoll_python-1.5.0}/pydoll/events/browser.py +0 -0
- {pydoll_python-1.3.3 → pydoll_python-1.5.0}/pydoll/events/dom.py +0 -0
- {pydoll_python-1.3.3 → pydoll_python-1.5.0}/pydoll/events/fetch.py +0 -0
- {pydoll_python-1.3.3 → pydoll_python-1.5.0}/pydoll/events/network.py +0 -0
- {pydoll_python-1.3.3 → pydoll_python-1.5.0}/pydoll/events/page.py +0 -0
- {pydoll_python-1.3.3 → pydoll_python-1.5.0}/pydoll/exceptions.py +0 -0
- {pydoll_python-1.3.3/pydoll/connection → pydoll_python-1.5.0/pydoll/mixins}/__init__.py +0 -0
- {pydoll_python-1.3.3 → pydoll_python-1.5.0}/pydoll/mixins/find_elements.py +0 -0
- {pydoll_python-1.3.3 → pydoll_python-1.5.0}/pydoll/utils.py +0 -0
|
@@ -0,0 +1,535 @@
|
|
|
1
|
+
Metadata-Version: 2.3
|
|
2
|
+
Name: pydoll-python
|
|
3
|
+
Version: 1.5.0
|
|
4
|
+
Summary:
|
|
5
|
+
Author: Thalison Fernandes
|
|
6
|
+
Author-email: thalissfernandes99@gmail.com
|
|
7
|
+
Requires-Python: >=3.10,<4.0
|
|
8
|
+
Classifier: Programming Language :: Python :: 3
|
|
9
|
+
Classifier: Programming Language :: Python :: 3.10
|
|
10
|
+
Classifier: Programming Language :: Python :: 3.11
|
|
11
|
+
Classifier: Programming Language :: Python :: 3.12
|
|
12
|
+
Classifier: Programming Language :: Python :: 3.13
|
|
13
|
+
Requires-Dist: aiofiles (>=23.2.1,<24.0.0)
|
|
14
|
+
Requires-Dist: aiohttp (>=3.9.5,<4.0.0)
|
|
15
|
+
Requires-Dist: bs4 (>=0.0.2,<0.0.3)
|
|
16
|
+
Requires-Dist: requests (>=2.31.0,<3.0.0)
|
|
17
|
+
Requires-Dist: websockets (>=13.1,<14.0)
|
|
18
|
+
Description-Content-Type: text/markdown
|
|
19
|
+
|
|
20
|
+
<p align="center">
|
|
21
|
+
<h1>🚀 Pydoll: Async Web Automation in Python!</h1>
|
|
22
|
+
</p>
|
|
23
|
+
<br>
|
|
24
|
+
<p align="center">
|
|
25
|
+
<img src="https://github.com/user-attachments/assets/c4615101-d932-4e79-8a08-f50fbc686e3b" alt="Alt text" /> <br><br>
|
|
26
|
+
</p>
|
|
27
|
+
|
|
28
|
+
<p align="center">
|
|
29
|
+
<a href="https://codecov.io/gh/autoscrape-labs/pydoll">
|
|
30
|
+
<img src="https://codecov.io/gh/autoscrape-labs/pydoll/graph/badge.svg?token=40I938OGM9"/>
|
|
31
|
+
</a>
|
|
32
|
+
<img src="https://github.com/thalissonvs/pydoll/actions/workflows/tests.yml/badge.svg" alt="Tests">
|
|
33
|
+
<img src="https://github.com/thalissonvs/pydoll/actions/workflows/ruff-ci.yml/badge.svg" alt="Ruff CI">
|
|
34
|
+
<img src="https://github.com/thalissonvs/pydoll/actions/workflows/release.yml/badge.svg" alt="Release">
|
|
35
|
+
<img src="https://tokei.rs/b1/github/thalissonvs/pydoll" alt="Total lines">
|
|
36
|
+
<img src="https://tokei.rs/b1/github/thalissonvs/pydoll?category=files" alt="Files">
|
|
37
|
+
<img src="https://tokei.rs/b1/github/thalissonvs/pydoll?category=comments" alt="Comments">
|
|
38
|
+
<img src="https://img.shields.io/github/issues/thalissonvs/pydoll?label=Issues" alt="GitHub issues">
|
|
39
|
+
<img src="https://img.shields.io/github/issues-closed/thalissonvs/pydoll?label=Closed issues" alt="GitHub closed issues">
|
|
40
|
+
<img src="https://img.shields.io/github/issues/thalissonvs/pydoll/bug?label=Bugs&color=red" alt="GitHub bug issues">
|
|
41
|
+
<img src="https://img.shields.io/github/issues/thalissonvs/pydoll/enhancement?label=Enhancements&color=purple" alt="GitHub enhancement issues">
|
|
42
|
+
</p>
|
|
43
|
+
<p align="center">
|
|
44
|
+
<a href="https://trendshift.io/repositories/13125" target="_blank"><img src="https://trendshift.io/api/badge/repositories/13125" alt="thalissonvs%2Fpydoll | Trendshift" style="width: 250px; height: 55px;" width="250" height="55"/></a>
|
|
45
|
+
</p>
|
|
46
|
+
|
|
47
|
+
<p align="center">
|
|
48
|
+
<b>Pydoll</b> is revolutionizing browser automation! Unlike other solutions, it <b>eliminates the need for webdrivers</b>,
|
|
49
|
+
providing a smooth and reliable automation experience with native asynchronous performance.
|
|
50
|
+
</p>
|
|
51
|
+
|
|
52
|
+
<p align="center">
|
|
53
|
+
<a href="#-installation">Installation</a> •
|
|
54
|
+
<a href="#-quick-start">Quick Start</a> •
|
|
55
|
+
<a href="#-core-components">Core Components</a> •
|
|
56
|
+
<a href="#-whats-new">What's New</a> •
|
|
57
|
+
<a href="#-advanced-features">Advanced Features</a>
|
|
58
|
+
</p>
|
|
59
|
+
|
|
60
|
+
## ✨ Key Features
|
|
61
|
+
|
|
62
|
+
🔹 **Zero Webdrivers!** Say goodbye to webdriver compatibility nightmares
|
|
63
|
+
🔹 **Native Captcha Bypass!** Smoothly handles Cloudflare Turnstile and reCAPTCHA v3*
|
|
64
|
+
🔹 **Async Performance** for lightning-fast automation
|
|
65
|
+
🔹 **Human-like Interactions** that mimic real user behavior
|
|
66
|
+
🔹 **Powerful Event System** for reactive automations
|
|
67
|
+
🔹 **Multi-browser Support** including Chrome and Edge
|
|
68
|
+
|
|
69
|
+
> *Note: For Cloudflare captcha, click the checkbox by finding the div containing the iframe and using the `.click()` method. Automatic detection coming soon!
|
|
70
|
+
|
|
71
|
+
## 🔥 Installation
|
|
72
|
+
|
|
73
|
+
```bash
|
|
74
|
+
pip install pydoll-python
|
|
75
|
+
```
|
|
76
|
+
|
|
77
|
+
## ⚡ Quick Start
|
|
78
|
+
|
|
79
|
+
Get started with just a few lines of code:
|
|
80
|
+
|
|
81
|
+
```python
|
|
82
|
+
import asyncio
|
|
83
|
+
from pydoll.browser.chrome import Chrome
|
|
84
|
+
from pydoll.constants import By
|
|
85
|
+
|
|
86
|
+
async def main():
|
|
87
|
+
async with Chrome() as browser:
|
|
88
|
+
await browser.start()
|
|
89
|
+
page = await browser.get_page()
|
|
90
|
+
|
|
91
|
+
# Works with captcha-protected sites
|
|
92
|
+
await page.go_to('https://example-with-cloudflare.com')
|
|
93
|
+
button = await page.find_element(By.CSS_SELECTOR, 'button')
|
|
94
|
+
await button.click()
|
|
95
|
+
|
|
96
|
+
asyncio.run(main())
|
|
97
|
+
```
|
|
98
|
+
|
|
99
|
+
Need to configure your browser? Easy!
|
|
100
|
+
|
|
101
|
+
```python
|
|
102
|
+
from pydoll.browser.chrome import Chrome
|
|
103
|
+
from pydoll.browser.options import Options
|
|
104
|
+
|
|
105
|
+
options = Options()
|
|
106
|
+
# Add a proxy
|
|
107
|
+
options.add_argument('--proxy-server=username:password@ip:port')
|
|
108
|
+
# Custom browser location
|
|
109
|
+
options.binary_location = '/path/to/your/browser'
|
|
110
|
+
|
|
111
|
+
async with Chrome(options=options) as browser:
|
|
112
|
+
await browser.start()
|
|
113
|
+
# Your code here
|
|
114
|
+
```
|
|
115
|
+
|
|
116
|
+
## 🎉 What's New
|
|
117
|
+
|
|
118
|
+
Version 1.4.0 comes packed with amazing new features:
|
|
119
|
+
|
|
120
|
+
### 🔤 Advanced Keyboard Control
|
|
121
|
+
|
|
122
|
+
Full keyboard simulation thanks to [@cleitonleonel](https://github.com/cleitonleonel):
|
|
123
|
+
|
|
124
|
+
```python
|
|
125
|
+
import asyncio
|
|
126
|
+
from pydoll.browser.chrome import Chrome
|
|
127
|
+
from pydoll.browser.options import Options
|
|
128
|
+
from pydoll.common.keys import Keys
|
|
129
|
+
from pydoll.constants import By
|
|
130
|
+
|
|
131
|
+
async def main():
|
|
132
|
+
async with Chrome() as browser:
|
|
133
|
+
await browser.start()
|
|
134
|
+
page = await browser.get_page()
|
|
135
|
+
await page.go_to('https://example.com')
|
|
136
|
+
|
|
137
|
+
input_field = await page.find_element(By.CSS_SELECTOR, 'input')
|
|
138
|
+
await input_field.click()
|
|
139
|
+
|
|
140
|
+
# Realistic typing with customizable speed
|
|
141
|
+
await input_field.type_keys("hello@example.com", interval=0.2)
|
|
142
|
+
|
|
143
|
+
# Special key combinations
|
|
144
|
+
await input_field.key_down(Keys.SHIFT)
|
|
145
|
+
await input_field.send_keys("UPPERCASE")
|
|
146
|
+
await input_field.key_up(Keys.SHIFT)
|
|
147
|
+
|
|
148
|
+
# Navigation keys
|
|
149
|
+
await input_field.send_keys(Keys.ENTER)
|
|
150
|
+
await input_field.send_keys(Keys.PAGEDOWN)
|
|
151
|
+
|
|
152
|
+
asyncio.run(main())
|
|
153
|
+
```
|
|
154
|
+
|
|
155
|
+
### 📁 File Upload Support
|
|
156
|
+
|
|
157
|
+
[@yie1d](https://github.com/yie1d) brings seamless file uploads:
|
|
158
|
+
|
|
159
|
+
```python
|
|
160
|
+
# For input elements
|
|
161
|
+
file_input = await page.find_element(By.XPATH, '//input[@type="file"]')
|
|
162
|
+
await file_input.set_input_files('path/to/file.pdf') # Single file
|
|
163
|
+
await file_input.set_input_files(['file1.pdf', 'file2.jpg']) # Multiple files
|
|
164
|
+
|
|
165
|
+
# For other elements using the file chooser
|
|
166
|
+
async with page.expect_file_chooser(files='path/to/file.pdf'):
|
|
167
|
+
upload_button = await page.find_element(By.ID, 'upload-button')
|
|
168
|
+
await upload_button.click()
|
|
169
|
+
```
|
|
170
|
+
|
|
171
|
+
### 🌐 Microsoft Edge Support
|
|
172
|
+
|
|
173
|
+
Now with Edge browser support thanks to [@Harris-H](https://github.com/Harris-H):
|
|
174
|
+
|
|
175
|
+
```python
|
|
176
|
+
import asyncio
|
|
177
|
+
from pydoll.browser import Edge
|
|
178
|
+
from pydoll.browser.options import EdgeOptions
|
|
179
|
+
|
|
180
|
+
async def main():
|
|
181
|
+
options = EdgeOptions()
|
|
182
|
+
# options.add_argument('--headless')
|
|
183
|
+
|
|
184
|
+
async with Edge(options=options) as browser:
|
|
185
|
+
await browser.start()
|
|
186
|
+
page = await browser.get_page()
|
|
187
|
+
await page.go_to('https://example.com')
|
|
188
|
+
|
|
189
|
+
asyncio.run(main())
|
|
190
|
+
```
|
|
191
|
+
|
|
192
|
+
## 🎯 Core Components
|
|
193
|
+
|
|
194
|
+
Pydoll offers three main interfaces for browser automation:
|
|
195
|
+
|
|
196
|
+
### Browser Interface
|
|
197
|
+
|
|
198
|
+
The Browser interface provides global control over the entire browser instance:
|
|
199
|
+
|
|
200
|
+
```python
|
|
201
|
+
async def browser_demo():
|
|
202
|
+
async with Chrome() as browser:
|
|
203
|
+
await browser.start()
|
|
204
|
+
|
|
205
|
+
# Create multiple pages
|
|
206
|
+
pages = [await browser.get_page() for _ in range(3)]
|
|
207
|
+
|
|
208
|
+
# Control the browser window
|
|
209
|
+
await browser.set_window_maximized()
|
|
210
|
+
|
|
211
|
+
# Manage cookies globally
|
|
212
|
+
await browser.set_cookies([{
|
|
213
|
+
'name': 'session',
|
|
214
|
+
'value': '12345',
|
|
215
|
+
'domain': 'example.com'
|
|
216
|
+
}])
|
|
217
|
+
```
|
|
218
|
+
|
|
219
|
+
#### Key Browser Methods
|
|
220
|
+
|
|
221
|
+
| Method | Description | Example |
|
|
222
|
+
|--------|-------------|---------|
|
|
223
|
+
| `async start()` | 🔥 Launch your browser and prepare for automation | `await browser.start()` |
|
|
224
|
+
| `async stop()` | 👋 Close the browser gracefully when finished | `await browser.stop()` |
|
|
225
|
+
| `async get_page()` | ✨ Get an existing page or create a new one | `page = await browser.get_page()` |
|
|
226
|
+
| `async new_page(url='')` | 🆕 Create a new page in the browser | `page_id = await browser.new_page()` |
|
|
227
|
+
| `async get_page_by_id(page_id)` | 🔍 Find and control a specific page by ID | `page = await browser.get_page_by_id(id)` |
|
|
228
|
+
| `async get_targets()` | 🎯 List all open pages in the browser | `targets = await browser.get_targets()` |
|
|
229
|
+
| `async set_window_bounds(bounds)` | 📐 Size and position the browser window | `await browser.set_window_bounds({'width': 1024})` |
|
|
230
|
+
| `async set_window_maximized()` | 💪 Maximize the browser window | `await browser.set_window_maximized()` |
|
|
231
|
+
| `async get_cookies()` | 🍪 Get all browser cookies | `cookies = await browser.get_cookies()` |
|
|
232
|
+
| `async set_cookies(cookies)` | 🧁 Set custom cookies for authentication | `await browser.set_cookies([{...}])` |
|
|
233
|
+
| `async delete_all_cookies()` | 🧹 Clear all cookies for a fresh state | `await browser.delete_all_cookies()` |
|
|
234
|
+
| `async set_download_path(path)` | 📂 Configure where downloaded files are saved | `await browser.set_download_path('/downloads')` |
|
|
235
|
+
|
|
236
|
+
### Page Interface
|
|
237
|
+
|
|
238
|
+
The Page interface lets you control individual browser tabs and interact with web content:
|
|
239
|
+
|
|
240
|
+
```python
|
|
241
|
+
async def page_demo():
|
|
242
|
+
page = await browser.get_page()
|
|
243
|
+
|
|
244
|
+
# Navigation
|
|
245
|
+
await page.go_to('https://example.com')
|
|
246
|
+
await page.refresh()
|
|
247
|
+
|
|
248
|
+
# Get page info
|
|
249
|
+
url = await page.current_url
|
|
250
|
+
html = await page.page_source
|
|
251
|
+
|
|
252
|
+
# Screenshots and PDF
|
|
253
|
+
await page.get_screenshot('screenshot.png')
|
|
254
|
+
await page.print_to_pdf('page.pdf')
|
|
255
|
+
|
|
256
|
+
# Execute JavaScript
|
|
257
|
+
title = await page.execute_script('return document.title')
|
|
258
|
+
```
|
|
259
|
+
|
|
260
|
+
#### Key Page Methods
|
|
261
|
+
|
|
262
|
+
| Method | Description | Example |
|
|
263
|
+
|--------|-------------|---------|
|
|
264
|
+
| `async go_to(url, timeout=300)` | 🚀 Navigate to a URL with loading detection | `await page.go_to('https://example.com')` |
|
|
265
|
+
| `async refresh()` | 🔄 Reload the current page | `await page.refresh()` |
|
|
266
|
+
| `async close()` | 🚪 Close the current tab | `await page.close()` |
|
|
267
|
+
| `async current_url` | 🧭 Get the current page URL | `url = await page.current_url` |
|
|
268
|
+
| `async page_source` | 📝 Get the page's HTML content | `html = await page.page_source` |
|
|
269
|
+
| `async get_screenshot(path)` | 📸 Save a screenshot of the page | `await page.get_screenshot('shot.png')` |
|
|
270
|
+
| `async print_to_pdf(path)` | 📄 Convert the page to a PDF document | `await page.print_to_pdf('page.pdf')` |
|
|
271
|
+
| `async has_dialog()` | 🔔 Check if a dialog is present | `if await page.has_dialog():` |
|
|
272
|
+
| `async accept_dialog()` | 👍 Dismiss alert and confirmation dialogs | `await page.accept_dialog()` |
|
|
273
|
+
| `async execute_script(script, element)` | ⚡ Run JavaScript code on the page | `await page.execute_script('alert("Hi!")')` |
|
|
274
|
+
| `async get_network_logs(matches=[])` | 🕸️ Monitor network requests | `logs = await page.get_network_logs()` |
|
|
275
|
+
| `async find_element(by, value)` | 🔎 Find an element on the page | `el = await page.find_element(By.ID, 'btn')` |
|
|
276
|
+
| `async find_elements(by, value)` | 🔍 Find multiple elements matching a selector | `items = await page.find_elements(By.CSS, 'li')` |
|
|
277
|
+
| `async wait_element(by, value, timeout=10)` | ⏳ Wait for an element to appear | `await page.wait_element(By.ID, 'loaded', 5)` |
|
|
278
|
+
|
|
279
|
+
### WebElement Interface
|
|
280
|
+
|
|
281
|
+
The WebElement interface provides methods to interact with DOM elements:
|
|
282
|
+
|
|
283
|
+
```python
|
|
284
|
+
async def element_demo():
|
|
285
|
+
# Find elements
|
|
286
|
+
button = await page.find_element(By.CSS_SELECTOR, 'button.submit')
|
|
287
|
+
input_field = await page.find_element(By.ID, 'username')
|
|
288
|
+
|
|
289
|
+
# Get properties
|
|
290
|
+
button_text = await button.get_element_text()
|
|
291
|
+
is_button_enabled = button.is_enabled
|
|
292
|
+
input_value = input_field.value
|
|
293
|
+
|
|
294
|
+
# Interact with elements
|
|
295
|
+
await button.scroll_into_view()
|
|
296
|
+
await input_field.type_keys("user123")
|
|
297
|
+
await button.click()
|
|
298
|
+
```
|
|
299
|
+
|
|
300
|
+
#### Key WebElement Methods
|
|
301
|
+
|
|
302
|
+
| Method | Description | Example |
|
|
303
|
+
|--------|-------------|---------|
|
|
304
|
+
| `value` | 💬 Get the value of an input element | `value = input_field.value` |
|
|
305
|
+
| `class_name` | 🎨 Get the element's CSS classes | `classes = element.class_name` |
|
|
306
|
+
| `id` | 🏷️ Get the element's ID attribute | `id = element.id` |
|
|
307
|
+
| `is_enabled` | ✅ Check if the element is enabled | `if button.is_enabled:` |
|
|
308
|
+
| `async bounds` | 📏 Get the element's position and size | `coords = await element.bounds` |
|
|
309
|
+
| `async inner_html` | 🧩 Get the element's inner HTML content | `html = await element.inner_html` |
|
|
310
|
+
| `async get_element_text()` | 📜 Get the element's text content | `text = await element.get_element_text()` |
|
|
311
|
+
| `get_attribute(name)` | 📊 Get any attribute from the element | `href = link.get_attribute('href')` |
|
|
312
|
+
| `async scroll_into_view()` | 👁️ Scroll the element into viewport | `await element.scroll_into_view()` |
|
|
313
|
+
| `async click(x_offset=0, y_offset=0)` | 👆 Click the element with optional offsets | `await button.click()` |
|
|
314
|
+
| `async click_using_js()` | 🔮 Click using JavaScript for hidden elements | `await overlay_button.click_using_js()` |
|
|
315
|
+
| `async send_keys(text)` | ⌨️ Send text to input fields | `await input.send_keys("text")` |
|
|
316
|
+
| `async type_keys(text, interval=0.1)` | 👨💻 Type text with realistic timing | `await input.type_keys("hello", 0.2)` |
|
|
317
|
+
| `async get_screenshot(path)` | 📷 Take a screenshot of the element | `await error.get_screenshot('error.png')` |
|
|
318
|
+
| `async set_input_files(files)` | 📤 Upload files with file inputs | `await input.set_input_files('file.pdf')` |
|
|
319
|
+
|
|
320
|
+
## 🚀 Advanced Features
|
|
321
|
+
|
|
322
|
+
### Event System
|
|
323
|
+
|
|
324
|
+
Pydoll's powerful event system lets you react to browser events in real-time:
|
|
325
|
+
|
|
326
|
+
```python
|
|
327
|
+
from pydoll.events.page import PageEvents
|
|
328
|
+
from pydoll.events.network import NetworkEvents
|
|
329
|
+
from functools import partial
|
|
330
|
+
|
|
331
|
+
# Page navigation events
|
|
332
|
+
async def on_page_loaded(event):
|
|
333
|
+
print(f"🌐 Page loaded: {event['params'].get('url')}")
|
|
334
|
+
|
|
335
|
+
await page.enable_page_events()
|
|
336
|
+
await page.on(PageEvents.PAGE_LOADED, on_page_loaded)
|
|
337
|
+
|
|
338
|
+
# Network request monitoring
|
|
339
|
+
async def on_request(page, event):
|
|
340
|
+
url = event['params']['request']['url']
|
|
341
|
+
print(f"🔄 Request to: {url}")
|
|
342
|
+
|
|
343
|
+
await page.enable_network_events()
|
|
344
|
+
await page.on(NetworkEvents.REQUEST_WILL_BE_SENT, partial(on_request, page))
|
|
345
|
+
|
|
346
|
+
# DOM change monitoring
|
|
347
|
+
from pydoll.events.dom import DomEvents
|
|
348
|
+
await page.enable_dom_events()
|
|
349
|
+
await page.on(DomEvents.DOCUMENT_UPDATED, lambda e: print("DOM updated!"))
|
|
350
|
+
```
|
|
351
|
+
|
|
352
|
+
### Request Interception
|
|
353
|
+
|
|
354
|
+
Pydoll gives you the power to intercept and modify network requests before they're sent! This allows you to customize headers or modify request data on the fly.
|
|
355
|
+
|
|
356
|
+
#### Basic Request Modification
|
|
357
|
+
|
|
358
|
+
The request interception system lets you monitor and modify requests before they're sent:
|
|
359
|
+
|
|
360
|
+
```python
|
|
361
|
+
from pydoll.events.fetch import FetchEvents
|
|
362
|
+
from pydoll.commands.fetch import FetchCommands
|
|
363
|
+
from functools import partial
|
|
364
|
+
|
|
365
|
+
async def request_interceptor(page, event):
|
|
366
|
+
request_id = event['params']['requestId']
|
|
367
|
+
url = event['params']['request']['url']
|
|
368
|
+
|
|
369
|
+
print(f"🔎 Intercepted request to: {url}")
|
|
370
|
+
|
|
371
|
+
# Continue the request normally
|
|
372
|
+
await page._execute_command(
|
|
373
|
+
FetchCommands.continue_request(
|
|
374
|
+
request_id=request_id
|
|
375
|
+
)
|
|
376
|
+
)
|
|
377
|
+
|
|
378
|
+
# Enable interception and register your handler
|
|
379
|
+
await page.enable_fetch_events()
|
|
380
|
+
await page.on(FetchEvents.REQUEST_PAUSED, partial(request_interceptor, page))
|
|
381
|
+
```
|
|
382
|
+
|
|
383
|
+
#### Adding Custom Headers
|
|
384
|
+
|
|
385
|
+
Inject authentication or tracking headers into specific requests:
|
|
386
|
+
|
|
387
|
+
```python
|
|
388
|
+
async def auth_header_interceptor(page, event):
|
|
389
|
+
request_id = event['params']['requestId']
|
|
390
|
+
url = event['params']['request']['url']
|
|
391
|
+
|
|
392
|
+
# Only add auth headers to API requests
|
|
393
|
+
if '/api/' in url:
|
|
394
|
+
# Get the original headers
|
|
395
|
+
original_headers = event['params']['request'].get('headers', {})
|
|
396
|
+
|
|
397
|
+
# Add your custom headers
|
|
398
|
+
custom_headers = {
|
|
399
|
+
**original_headers,
|
|
400
|
+
'Authorization': 'Bearer your-token-123',
|
|
401
|
+
'X-Custom-Track': 'pydoll-automation'
|
|
402
|
+
}
|
|
403
|
+
|
|
404
|
+
await page._execute_command(
|
|
405
|
+
FetchCommands.continue_request(
|
|
406
|
+
request_id=request_id,
|
|
407
|
+
headers=custom_headers
|
|
408
|
+
)
|
|
409
|
+
)
|
|
410
|
+
else:
|
|
411
|
+
# Continue normally for non-API requests
|
|
412
|
+
await page._execute_command(
|
|
413
|
+
FetchCommands.continue_request(
|
|
414
|
+
request_id=request_id
|
|
415
|
+
)
|
|
416
|
+
)
|
|
417
|
+
|
|
418
|
+
await page.enable_fetch_events()
|
|
419
|
+
await page.on(FetchEvents.REQUEST_PAUSED, partial(auth_header_interceptor, page))
|
|
420
|
+
```
|
|
421
|
+
|
|
422
|
+
#### Modifying Request Body
|
|
423
|
+
|
|
424
|
+
Change POST data before it's sent:
|
|
425
|
+
|
|
426
|
+
```python
|
|
427
|
+
async def modify_request_body(page, event):
|
|
428
|
+
request_id = event['params']['requestId']
|
|
429
|
+
url = event['params']['request']['url']
|
|
430
|
+
method = event['params']['request'].get('method', '')
|
|
431
|
+
|
|
432
|
+
# Only modify POST requests to specific endpoints
|
|
433
|
+
if method == 'POST' and 'submit-form' in url:
|
|
434
|
+
# Get original request body if it exists
|
|
435
|
+
original_body = event['params']['request'].get('postData', '{}')
|
|
436
|
+
|
|
437
|
+
# In a real scenario, you'd parse and modify the body
|
|
438
|
+
# For this example, we're just replacing it
|
|
439
|
+
new_body = '{"modified": true, "data": "enhanced-by-pydoll"}'
|
|
440
|
+
|
|
441
|
+
print(f"✏️ Modifying POST request to: {url}")
|
|
442
|
+
await page._execute_command(
|
|
443
|
+
FetchCommands.continue_request(
|
|
444
|
+
request_id=request_id,
|
|
445
|
+
post_data=new_body
|
|
446
|
+
)
|
|
447
|
+
)
|
|
448
|
+
else:
|
|
449
|
+
# Continue normally for other requests
|
|
450
|
+
await page._execute_command(
|
|
451
|
+
FetchCommands.continue_request(
|
|
452
|
+
request_id=request_id
|
|
453
|
+
)
|
|
454
|
+
)
|
|
455
|
+
|
|
456
|
+
await page.enable_fetch_events()
|
|
457
|
+
await page.on(FetchEvents.REQUEST_PAUSED, partial(modify_request_body, page))
|
|
458
|
+
```
|
|
459
|
+
|
|
460
|
+
### Filtering Request Types
|
|
461
|
+
|
|
462
|
+
You can focus on specific types of requests to intercept:
|
|
463
|
+
|
|
464
|
+
```python
|
|
465
|
+
# Just intercept XHR requests
|
|
466
|
+
await page.enable_fetch_events(resource_type='xhr')
|
|
467
|
+
|
|
468
|
+
# Or focus on document requests
|
|
469
|
+
await page.enable_fetch_events(resource_type='document')
|
|
470
|
+
|
|
471
|
+
# Or maybe just images
|
|
472
|
+
await page.enable_fetch_events(resource_type='image')
|
|
473
|
+
```
|
|
474
|
+
|
|
475
|
+
Available resource types include: `document`, `stylesheet`, `image`, `media`, `font`, `script`, `texttrack`, `xhr`, `fetch`, `eventsource`, `websocket`, `manifest`, `other`.
|
|
476
|
+
|
|
477
|
+
### Concurrent Automation
|
|
478
|
+
|
|
479
|
+
Process multiple pages simultaneously for maximum efficiency:
|
|
480
|
+
|
|
481
|
+
```python
|
|
482
|
+
async def process_page(url):
|
|
483
|
+
page = await browser.get_page()
|
|
484
|
+
await page.go_to(url)
|
|
485
|
+
# Do your scraping or automation here
|
|
486
|
+
return await page.get_element_text()
|
|
487
|
+
|
|
488
|
+
# Process multiple URLs concurrently
|
|
489
|
+
urls = ['https://example1.com', 'https://example2.com', 'https://example3.com']
|
|
490
|
+
results = await asyncio.gather(*(process_page(url) for url in urls))
|
|
491
|
+
```
|
|
492
|
+
|
|
493
|
+
## 💡 Best Practices
|
|
494
|
+
|
|
495
|
+
Maximize your Pydoll experience with these tips:
|
|
496
|
+
|
|
497
|
+
✅ **Embrace async patterns** throughout your code for best performance
|
|
498
|
+
✅ **Use specific selectors** (IDs, unique attributes) for reliable element finding
|
|
499
|
+
✅ **Implement proper error handling** with try/except blocks around critical operations
|
|
500
|
+
✅ **Leverage the event system** instead of polling for state changes
|
|
501
|
+
✅ **Properly close resources** with async context managers
|
|
502
|
+
✅ **Wait for elements** instead of fixed sleep delays
|
|
503
|
+
✅ **Use realistic interactions** like `type_keys()` to avoid detection
|
|
504
|
+
|
|
505
|
+
## 🤝 Contributing
|
|
506
|
+
|
|
507
|
+
We'd love your help making Pydoll even better! Check out our [contribution guidelines](CONTRIBUTING.md) to get started. Whether it's fixing bugs, adding features, or improving documentation - all contributions are welcome!
|
|
508
|
+
|
|
509
|
+
Please make sure to:
|
|
510
|
+
- Write tests for new features or bug fixes
|
|
511
|
+
- Follow coding style and conventions
|
|
512
|
+
- Use conventional commits for pull requests
|
|
513
|
+
- Run lint and test checks before submitting
|
|
514
|
+
|
|
515
|
+
## 🔮 Coming Soon
|
|
516
|
+
|
|
517
|
+
Get ready for these upcoming features in Pydoll:
|
|
518
|
+
|
|
519
|
+
🔹 **Auto-detection of Cloudflare Captcha** - Automatic solving without manual intervention
|
|
520
|
+
🔹 **Fingerprint Generation & Rotation** - Dynamic browser fingerprints to avoid detection
|
|
521
|
+
🔹 **Proxy Rotation** - Seamless IP switching for extended scraping sessions
|
|
522
|
+
🔹 **Shadow DOM Access** - Navigate and interact with Shadow Root elements
|
|
523
|
+
|
|
524
|
+
Stay tuned and star the repository to get updates when these features are released!
|
|
525
|
+
|
|
526
|
+
## 📄 License
|
|
527
|
+
|
|
528
|
+
Pydoll is licensed under the [MIT License](LICENSE).
|
|
529
|
+
|
|
530
|
+
---
|
|
531
|
+
|
|
532
|
+
<p align="center">
|
|
533
|
+
<b>Pydoll</b> — Making browser automation magical! ✨
|
|
534
|
+
</p>
|
|
535
|
+
|