puppeteer-bidi 0.0.3.beta1 → 0.0.3.beta2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -1,234 +0,0 @@
1
- # Porting Puppeteer to Ruby
2
-
3
- Best practices for implementing Puppeteer features in puppeteer-bidi.
4
-
5
- ## 1. Reference Implementation First
6
-
7
- **Always consult the official Puppeteer implementation before implementing features:**
8
-
9
- - **TypeScript source files**:
10
- - `packages/puppeteer-core/src/bidi/Page.ts` - High-level Page API
11
- - `packages/puppeteer-core/src/bidi/BrowsingContext.ts` - Core BiDi context
12
- - `packages/puppeteer-core/src/api/Page.ts` - Common Page interface
13
-
14
- - **Test files**:
15
- - `test/src/screenshot.spec.ts` - Screenshot test suite
16
- - `test/golden-firefox/` - Golden images for visual regression testing
17
-
18
- **Example workflow:**
19
-
20
- ```ruby
21
- # 1. Read Puppeteer's TypeScript implementation
22
- # 2. Understand the BiDi protocol calls being made
23
- # 3. Implement Ruby equivalent with same logic flow
24
- # 4. Port corresponding test cases
25
- ```
26
-
27
- ## 2. Test Infrastructure Setup
28
-
29
- **Use async-http for test servers** (lightweight + Async-friendly):
30
-
31
- ```ruby
32
- # spec/support/test_server.rb
33
- endpoint = Async::HTTP::Endpoint.parse("http://127.0.0.1:#{@port}")
34
-
35
- server = Async::HTTP::Server.for(endpoint) do |request|
36
- if handler = lookup_route(request.path)
37
- notify_request(request.path)
38
- respond_with_handler(handler, request)
39
- else
40
- serve_static_asset(request)
41
- end
42
- end
43
-
44
- server.run
45
- ```
46
-
47
- **Helper pattern for integration tests:**
48
-
49
- ```ruby
50
- # Optimized helper - reuses shared browser, creates new page per test
51
- def with_test_state
52
- page = $shared_browser.new_page
53
- context = $shared_browser.default_browser_context
54
-
55
- begin
56
- yield(page: page, server: $shared_test_server, browser: $shared_browser, context: context)
57
- ensure
58
- page.close unless page.closed?
59
- end
60
- end
61
- ```
62
-
63
- ## 3. BiDi Protocol Data Deserialization
64
-
65
- **BiDi returns values in special format - always deserialize:**
66
-
67
- ```ruby
68
- # BiDi response format:
69
- # [["width", {"type" => "number", "value" => 500}],
70
- # ["height", {"type" => "number", "value" => 1000}]]
71
-
72
- def deserialize_result(result)
73
- value = result['value']
74
- return value unless value.is_a?(Array)
75
-
76
- # Convert to Ruby Hash
77
- if value.all? { |item| item.is_a?(Array) && item.length == 2 }
78
- value.each_with_object({}) do |(key, val), hash|
79
- hash[key] = deserialize_value(val)
80
- end
81
- else
82
- value
83
- end
84
- end
85
-
86
- def deserialize_value(val)
87
- case val['type']
88
- when 'number' then val['value']
89
- when 'string' then val['value']
90
- when 'boolean' then val['value']
91
- when 'undefined', 'null' then nil
92
- else val['value']
93
- end
94
- end
95
- ```
96
-
97
- ## 4. Implementing Puppeteer-Compatible APIs
98
-
99
- **Follow Puppeteer's exact logic flow:**
100
-
101
- Example: `fullPage` screenshot implementation
102
-
103
- ```ruby
104
- # From Puppeteer's Page.ts:
105
- # if (options.fullPage) {
106
- # if (!options.captureBeyondViewport) {
107
- # // Resize viewport to full page
108
- # }
109
- # } else {
110
- # options.captureBeyondViewport = false;
111
- # }
112
-
113
- if full_page
114
- unless capture_beyond_viewport
115
- scroll_dimensions = evaluate(...)
116
- set_viewport(scroll_dimensions)
117
- begin
118
- data = capture_screenshot(origin: 'viewport')
119
- ensure
120
- set_viewport(original_viewport) # Always restore
121
- end
122
- else
123
- options[:origin] = 'document'
124
- end
125
- elsif !clip
126
- capture_beyond_viewport = false # Match Puppeteer behavior
127
- end
128
- ```
129
-
130
- **Key principles:**
131
-
132
- - Use `begin/ensure` blocks for cleanup (viewport restoration, etc.)
133
- - Match Puppeteer's parameter defaults exactly
134
- - Follow the same conditional logic order
135
-
136
- ## 5. Layer Architecture
137
-
138
- **Maintain clear separation:**
139
-
140
- ```
141
- High-level API (lib/puppeteer/bidi/)
142
- ├── Browser - User-facing browser interface
143
- ├── BrowserContext - Session management
144
- └── Page - Page automation API
145
-
146
- Core Layer (lib/puppeteer/bidi/core/)
147
- ├── Session - BiDi session management
148
- ├── Browser - Low-level browser operations
149
- ├── UserContext - BiDi user context
150
- └── BrowsingContext - BiDi browsing context (tab/frame)
151
- ```
152
-
153
- ## 6. Setting Page Content
154
-
155
- **Use data URLs with base64 encoding:**
156
-
157
- ```ruby
158
- def set_content(html, wait_until: 'load')
159
- # Encode HTML in base64 to avoid URL encoding issues
160
- encoded = Base64.strict_encode64(html)
161
- data_url = "data:text/html;base64,#{encoded}"
162
- goto(data_url, wait_until: wait_until)
163
- end
164
- ```
165
-
166
- **Why base64:**
167
-
168
- - Avoids URL encoding issues with special characters
169
- - Handles multi-byte characters correctly
170
- - Standard approach in browser automation tools
171
-
172
- ## 7. Viewport Restoration
173
-
174
- **Always restore viewport after temporary changes:**
175
-
176
- ```ruby
177
- # Save current viewport (may be nil)
178
- original_viewport = viewport
179
-
180
- # If no viewport set, save window size
181
- unless original_viewport
182
- original_size = evaluate('({ width: window.innerWidth, height: window.innerHeight })')
183
- original_viewport = { width: original_size['width'].to_i, height: original_size['height'].to_i }
184
- end
185
-
186
- # Change viewport temporarily
187
- set_viewport(width: new_width, height: new_height)
188
-
189
- begin
190
- # Do work
191
- ensure
192
- # Always restore
193
- set_viewport(**original_viewport) if original_viewport
194
- end
195
- ```
196
-
197
- ## 8. Test Assets Policy
198
-
199
- **CRITICAL**: Always use Puppeteer's official test assets without modification.
200
-
201
- - **Source**: https://github.com/puppeteer/puppeteer/tree/main/test/assets
202
- - **Rule**: Never modify test asset files (HTML, CSS, images) in `spec/assets/`
203
- - **Verification**: Before creating PR, verify all `spec/assets/` files match Puppeteer's official versions
204
-
205
- ```bash
206
- # During development - OK to experiment
207
- vim spec/assets/test.html # Temporary modification for debugging
208
-
209
- # Before PR - MUST revert to official
210
- curl -sL https://raw.githubusercontent.com/puppeteer/puppeteer/main/test/assets/test.html \
211
- -o spec/assets/test.html
212
- ```
213
-
214
- **Why this matters**: Test assets are designed to test specific edge cases (rotated elements, complex layouts, etc.). Using simplified versions defeats the purpose of these tests.
215
-
216
- ## 9. API Coverage Update
217
-
218
- **IMPORTANT**: When implementing new Puppeteer API methods, update `API_COVERAGE.md`:
219
-
220
- 1. Find the corresponding entry in the table (e.g., `Browser.userAgent`, `Page.setUserAgent`)
221
- 2. Change the status from `❌` to `✅`
222
- 3. Update the coverage count at the top of the file
223
-
224
- ```markdown
225
- # Before
226
- - Coverage: `156/274` (`56.93%`)
227
- | `Browser.userAgent` | `Puppeteer::Bidi::Browser#user_agent` | ❌ |
228
-
229
- # After (implemented 2 new methods: 156 + 2 = 158)
230
- - Coverage: `158/274` (`57.66%`)
231
- | `Browser.userAgent` | `Puppeteer::Bidi::Browser#user_agent` | ✅ |
232
- ```
233
-
234
- **CI will fail** if API_COVERAGE.md is not updated when new methods are implemented. The API Coverage check compares implemented methods against the coverage file.
@@ -1,194 +0,0 @@
1
- # QueryHandler Implementation
2
-
3
- The QueryHandler system provides extensible selector handling for CSS, XPath, text, and other selector types.
4
-
5
- ## Architecture
6
-
7
- ```
8
- QueryHandler (singleton)
9
- ├── get_query_handler_and_selector(selector)
10
- │ └── Returns: { updated_selector, polling, query_handler }
11
-
12
- BaseQueryHandler
13
- ├── run_query_one(element, selector) → ElementHandle | nil
14
- ├── run_query_all(element, selector) → Array<ElementHandle>
15
- └── wait_for(element_or_frame, selector, options)
16
-
17
- Implementations:
18
- ├── CSSQueryHandler - Default, uses cssQuerySelector/cssQuerySelectorAll
19
- ├── XPathQueryHandler - xpath/ prefix, uses xpathQuerySelectorAll
20
- └── TextQueryHandler - text/ prefix, uses textQuerySelectorAll
21
- ```
22
-
23
- ## Selector Prefixes
24
-
25
- | Prefix | Handler | Example |
26
- | --------- | ------------------ | --------------------------- |
27
- | (none) | CSSQueryHandler | `div.foo`, `#id` |
28
- | `xpath/` | XPathQueryHandler | `xpath/html/body/div` |
29
- | `text/` | TextQueryHandler | `text/Hello World` |
30
- | `aria/` | ARIAQueryHandler | `aria/Submit[role="button"]`|
31
- | `pierce/` | PierceQueryHandler | `pierce/.shadow-element` |
32
-
33
- ## Implementation Pattern
34
-
35
- All query handlers follow the same pattern: override `query_one_script`, `query_all_script`, and `wait_for_selector_script` to define the JavaScript that runs in the browser.
36
-
37
- ```ruby
38
- class CSSQueryHandler < BaseQueryHandler
39
- private
40
-
41
- def query_one_script
42
- <<~JAVASCRIPT
43
- (PuppeteerUtil, element, selector) => {
44
- return PuppeteerUtil.cssQuerySelector(element, selector);
45
- }
46
- JAVASCRIPT
47
- end
48
-
49
- def query_all_script
50
- <<~JAVASCRIPT
51
- async (PuppeteerUtil, element, selector) => {
52
- return [...PuppeteerUtil.cssQuerySelectorAll(element, selector)];
53
- }
54
- JAVASCRIPT
55
- end
56
-
57
- def wait_for_selector_script
58
- <<~JAVASCRIPT
59
- (PuppeteerUtil, selector, root, visibility) => {
60
- const element = PuppeteerUtil.cssQuerySelector(root || document, selector);
61
- return PuppeteerUtil.checkVisibility(element, visibility === null ? undefined : visibility);
62
- }
63
- JAVASCRIPT
64
- end
65
- end
66
- ```
67
-
68
- ## TextQueryHandler - Special Case
69
-
70
- `textQuerySelectorAll` cannot be extracted via `toString()` because it references helper functions (`f`, `m`, `d`) that are only available within the PuppeteerUtil closure. So TextQueryHandler uses a different pattern: call `textQuerySelectorAll` directly from PuppeteerUtil instead of recreating the function.
71
-
72
- ```ruby
73
- class TextQueryHandler < BaseQueryHandler
74
- private
75
-
76
- def query_one_script
77
- <<~JAVASCRIPT
78
- (PuppeteerUtil, element, selector) => {
79
- for (const result of PuppeteerUtil.textQuerySelectorAll(element, selector)) {
80
- return result;
81
- }
82
- return null;
83
- }
84
- JAVASCRIPT
85
- end
86
-
87
- def query_all_script
88
- <<~JAVASCRIPT
89
- async (PuppeteerUtil, element, selector) => {
90
- return [...PuppeteerUtil.textQuerySelectorAll(element, selector)];
91
- }
92
- JAVASCRIPT
93
- end
94
- end
95
- ```
96
-
97
- ## Handle Adoption Pattern
98
-
99
- After navigation, the sandbox realm is destroyed and `puppeteer_util` handles become stale. The solution is to adopt the element into the isolated realm BEFORE calling any query methods:
100
-
101
- ```ruby
102
- def run_query_one(element, selector)
103
- realm = element.frame.isolated_realm
104
-
105
- # Adopt the element into the isolated realm first.
106
- # This ensures the realm is valid and triggers puppeteer_util reset if needed
107
- # after navigation (mirrors Puppeteer's @bindIsolatedHandle decorator pattern).
108
- adopted_element = realm.adopt_handle(element)
109
-
110
- result = realm.call_function(
111
- query_one_script,
112
- false,
113
- arguments: [
114
- Serializer.serialize(realm.puppeteer_util_lazy_arg),
115
- adopted_element.remote_value,
116
- Serializer.serialize(selector)
117
- ]
118
- )
119
-
120
- # ... handle result ...
121
- ensure
122
- adopted_element&.dispose
123
- end
124
- ```
125
-
126
- **Why this matters:**
127
-
128
- 1. After navigation, sandbox realm is destroyed
129
- 2. `:updated` event that resets `puppeteer_util` isn't fired until we call INTO the sandbox
130
- 3. Calling `adopt_handle` first ensures the realm exists and is valid
131
- 4. Then `puppeteer_util` will be fresh (re-evaluated if realm was recreated)
132
-
133
- ## Debugging
134
-
135
- Use `DEBUG_BIDI_COMMAND=1` to see protocol messages:
136
-
137
- ```bash
138
- DEBUG_BIDI_COMMAND=1 bundle exec rspec spec/integration/queryhandler_spec.rb:8
139
- ```
140
-
141
- This shows:
142
- 1. PuppeteerUtil being evaluated in sandbox
143
- 2. The query function being called with arguments
144
- 3. The result being returned
145
-
146
- ## Adding New Query Handlers
147
-
148
- 1. Create a new class extending `BaseQueryHandler`
149
- 2. Override `query_one_script`, `query_all_script`, and `wait_for_selector_script`
150
- 3. Register in `BUILTIN_QUERY_HANDLERS` constant
151
- 4. The script receives `(PuppeteerUtil, element, selector)` as arguments
152
- 5. Return the element(s) found or null/empty array
153
-
154
- ```ruby
155
- class MyQueryHandler < BaseQueryHandler
156
- private
157
-
158
- def query_one_script
159
- <<~JAVASCRIPT
160
- (PuppeteerUtil, element, selector) => {
161
- // Use PuppeteerUtil.myQuerySelector if available
162
- // Or implement custom logic
163
- return element.querySelector(selector);
164
- }
165
- JAVASCRIPT
166
- end
167
-
168
- def query_all_script
169
- <<~JAVASCRIPT
170
- async (PuppeteerUtil, element, selector) => {
171
- return [...element.querySelectorAll(selector)];
172
- }
173
- JAVASCRIPT
174
- end
175
-
176
- def wait_for_selector_script
177
- <<~JAVASCRIPT
178
- (PuppeteerUtil, selector, root, visibility) => {
179
- const element = (root || document).querySelector(selector);
180
- return PuppeteerUtil.checkVisibility(element, visibility === null ? undefined : visibility);
181
- }
182
- JAVASCRIPT
183
- end
184
- end
185
- ```
186
-
187
- ## Test Coverage
188
-
189
- Tests are in `spec/integration/queryhandler_spec.rb`:
190
-
191
- - Text selectors: 12 tests (query_selector, query_selector_all, shadow DOM piercing, etc.)
192
- - XPath selectors: 6 tests (in Page and ElementHandle)
193
-
194
- Tests ported from Puppeteer's `test/src/queryhandler.spec.ts`.
@@ -1,111 +0,0 @@
1
- # ReactorRunner - Using Browser Outside Sync Blocks
2
-
3
- ## Problem
4
-
5
- The socketry/async library requires all async operations to run inside a `Sync do ... end` block. However, some use cases cannot wrap their entire code in a Sync block:
6
-
7
- ```ruby
8
- # This pattern doesn't work with plain async:
9
- browser = Puppeteer::Bidi.launch_browser_instance(headless: true)
10
- at_exit { browser.close } # Called outside any Sync block!
11
-
12
- Sync do
13
- page = browser.new_page
14
- page.goto("https://example.com")
15
- end
16
- ```
17
-
18
- The `at_exit` hook runs after the Sync block has finished, so `browser.close` would fail.
19
-
20
- ## Solution: ReactorRunner
21
-
22
- `ReactorRunner` creates a dedicated Async reactor in a background thread and provides a way to execute code within that reactor from any thread.
23
-
24
- ### How It Works
25
-
26
- 1. **Background Thread with Reactor**: ReactorRunner spawns a new thread that runs `Sync do ... end` with an `Async::Queue` for receiving jobs
27
- 2. **Proxy Pattern**: Returns a `Proxy` object that wraps the real Browser and forwards all method calls through the ReactorRunner
28
- 3. **Automatic Detection**: `launch_browser_instance` and `connect_to_browser_instance` check `Async::Task.current` to decide whether to use ReactorRunner
29
-
30
- ### Architecture
31
-
32
- ```
33
- Main Thread Background Thread (ReactorRunner)
34
- │ │
35
- │ launch_browser_instance() │
36
- │ ─────────────────────────────────>│ Sync do
37
- │ │ Browser.launch()
38
- │ <─────────────────────────────────│ (browser created)
39
- │ returns Proxy │
40
- │ │
41
- │ proxy.new_page() │
42
- │ ─────────────────────────────────>│ browser.new_page()
43
- │ <─────────────────────────────────│ (returns page)
44
- │ │
45
- │ at_exit { proxy.close } │
46
- │ ─────────────────────────────────>│ browser.close()
47
- │ │ end
48
- │ │
49
- ```
50
-
51
- ### Key Components
52
-
53
- #### ReactorRunner
54
-
55
- - Creates background thread with `Sync` reactor
56
- - Uses `Async::Queue` to receive jobs from other threads
57
- - `sync(&block)` method executes block in reactor and returns result
58
- - Handles proper cleanup when closed
59
-
60
- #### ReactorRunner::Proxy
61
-
62
- - Extends `SimpleDelegator` for transparent method forwarding
63
- - Wraps/unwraps return values (e.g., Page becomes Proxy too)
64
- - `owns_runner: true` means closing browser also closes the ReactorRunner
65
- - Handles edge cases like calling `close` after runner is already closed
66
-
67
- ### Usage Patterns
68
-
69
- #### Pattern 1: Block-based (Recommended)
70
-
71
- ```ruby
72
- Puppeteer::Bidi.launch do |browser|
73
- page = browser.new_page
74
- # ... use browser
75
- end # automatically closed
76
- ```
77
-
78
- #### Pattern 2: Instance with at_exit
79
-
80
- ```ruby
81
- browser = Puppeteer::Bidi.launch_browser_instance(headless: true)
82
- at_exit { browser.close }
83
-
84
- Sync do
85
- page = browser.new_page
86
- page.goto("https://example.com")
87
- end
88
- ```
89
-
90
- #### Pattern 3: Inside existing Async context
91
-
92
- ```ruby
93
- Sync do
94
- # No ReactorRunner used - browser is returned directly
95
- browser = Puppeteer::Bidi.launch_browser_instance(headless: true)
96
- page = browser.new_page
97
- # ...
98
- browser.close
99
- end
100
- ```
101
-
102
- ### Implementation Notes
103
-
104
- 1. **Thread Safety**: `Async::Queue` handles cross-thread communication safely
105
- 2. **Proxyable Check**: Only `Puppeteer::Bidi::*` objects (excluding Core layer) are wrapped in Proxy
106
- 3. **Error Handling**: Errors in reactor are propagated back to calling thread via `Async::Promise`
107
- 4. **Type Annotations**: Return type is `Browser` (Proxy is an internal detail)
108
-
109
- ### Reference
110
-
111
- This pattern is inspired by [async-webdriver](https://github.com/socketry/async-webdriver) by Samuel Williams (author of socketry/async).