@dyyz1993/agent-browser 0.9.2 → 0.11.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (116) hide show
  1. package/dist/__tests__/utils/parseCli.d.ts +1 -0
  2. package/dist/__tests__/utils/parseCli.d.ts.map +1 -1
  3. package/dist/__tests__/utils/parseCli.js +18 -10
  4. package/dist/__tests__/utils/parseCli.js.map +1 -1
  5. package/dist/actions.d.ts.map +1 -1
  6. package/dist/actions.js +63 -3
  7. package/dist/actions.js.map +1 -1
  8. package/dist/browser.d.ts +46 -2
  9. package/dist/browser.d.ts.map +1 -1
  10. package/dist/browser.js +343 -13
  11. package/dist/browser.js.map +1 -1
  12. package/dist/cli/commands.d.ts.map +1 -1
  13. package/dist/cli/commands.js +8 -3
  14. package/dist/cli/commands.js.map +1 -1
  15. package/dist/cli/connection.d.ts.map +1 -1
  16. package/dist/cli/connection.js +39 -1
  17. package/dist/cli/connection.js.map +1 -1
  18. package/dist/cli/help.d.ts.map +1 -1
  19. package/dist/cli/help.js +27 -20
  20. package/dist/cli/help.js.map +1 -1
  21. package/dist/cli/output.d.ts.map +1 -1
  22. package/dist/cli/output.js +5 -0
  23. package/dist/cli/output.js.map +1 -1
  24. package/dist/cli.js +20 -0
  25. package/dist/cli.js.map +1 -1
  26. package/dist/daemon.d.ts.map +1 -1
  27. package/dist/daemon.js +147 -1
  28. package/dist/daemon.js.map +1 -1
  29. package/dist/message-bridge.d.ts.map +1 -1
  30. package/dist/message-bridge.js +22 -4
  31. package/dist/message-bridge.js.map +1 -1
  32. package/dist/openapi.d.ts +22 -0
  33. package/dist/openapi.d.ts.map +1 -0
  34. package/dist/openapi.js +382 -0
  35. package/dist/openapi.js.map +1 -0
  36. package/dist/protocol.d.ts.map +1 -1
  37. package/dist/protocol.js +18 -0
  38. package/dist/protocol.js.map +1 -1
  39. package/dist/recorder/inject.js +61 -134
  40. package/dist/stream-server-standalone.d.ts +10 -0
  41. package/dist/stream-server-standalone.d.ts.map +1 -1
  42. package/dist/stream-server-standalone.js +594 -74
  43. package/dist/stream-server-standalone.js.map +1 -1
  44. package/dist/stream-server.d.ts +67 -2
  45. package/dist/stream-server.d.ts.map +1 -1
  46. package/dist/stream-server.js +371 -51
  47. package/dist/stream-server.js.map +1 -1
  48. package/dist/swagger-ui.d.ts +6 -0
  49. package/dist/swagger-ui.d.ts.map +1 -0
  50. package/dist/swagger-ui.js +51 -0
  51. package/dist/swagger-ui.js.map +1 -0
  52. package/dist/test-live.d.ts +2 -0
  53. package/dist/test-live.d.ts.map +1 -0
  54. package/dist/test-live.js +333 -0
  55. package/dist/test-live.js.map +1 -0
  56. package/dist/types.d.ts +7 -1
  57. package/dist/types.d.ts.map +1 -1
  58. package/dist/types.js.map +1 -1
  59. package/dist/viewer-html.d.ts.map +1 -1
  60. package/dist/viewer-html.js +270 -58
  61. package/dist/viewer-html.js.map +1 -1
  62. package/dist/viewer-script.d.ts +20 -2
  63. package/dist/viewer-script.d.ts.map +1 -1
  64. package/dist/viewer-script.js +911 -154
  65. package/dist/viewer-script.js.map +1 -1
  66. package/package.json +1 -1
  67. package/scripts/postinstall.js +6 -32
  68. package/scripts/test-cli-help.sh +51 -0
  69. package/scripts/verify-form.sh +67 -0
  70. package/scripts/verify-login.sh +65 -0
  71. package/scripts/verify-recording.sh +80 -0
  72. package/scripts/verify-upload.sh +41 -0
  73. package/skills/agent-browser/SKILL.md +297 -160
  74. package/skills/agent-browser/references/commands.md +3 -0
  75. package/skills/agent-browser/references/mobile-viewer.md +188 -0
  76. package/skills/agent-browser/references/network-monitoring.md +232 -0
  77. package/skills/agent-browser/references/recorder.md +319 -0
  78. package/skills/agent-browser/references/viewer-mode.md +148 -0
  79. package/skills/agent-browser/templates/api-interception.sh +3 -1
  80. package/skills/agent-browser/templates/data-extraction.sh +8 -4
  81. package/skills/agent-browser/templates/form-automation.sh +18 -23
  82. package/skills/agent-browser/templates/network-intercept-crawl.sh +256 -0
  83. package/skills/agent-browser/templates/recorder-workflow.sh +51 -0
  84. package/skills/agent-browser/templates/viewer-remote.sh +41 -0
  85. package/dist/__tests__/test-iframe.d.ts +0 -2
  86. package/dist/__tests__/test-iframe.d.ts.map +0 -1
  87. package/dist/__tests__/test-iframe.js +0 -52
  88. package/dist/__tests__/test-iframe.js.map +0 -1
  89. package/dist/cli-new.d.ts +0 -3
  90. package/dist/cli-new.d.ts.map +0 -1
  91. package/dist/cli-new.js +0 -308
  92. package/dist/cli-new.js.map +0 -1
  93. package/dist/cli-old.d.ts +0 -3
  94. package/dist/cli-old.d.ts.map +0 -1
  95. package/dist/cli-old.js +0 -1101
  96. package/dist/cli-old.js.map +0 -1
  97. package/dist/recorder/binding.d.ts +0 -24
  98. package/dist/recorder/binding.d.ts.map +0 -1
  99. package/dist/recorder/binding.js +0 -215
  100. package/dist/recorder/binding.js.map +0 -1
  101. package/dist/recorder/index.d.ts +0 -4
  102. package/dist/recorder/index.d.ts.map +0 -1
  103. package/dist/recorder/index.js +0 -4
  104. package/dist/recorder/index.js.map +0 -1
  105. package/dist/recorder/recorder.d.ts +0 -19
  106. package/dist/recorder/recorder.d.ts.map +0 -1
  107. package/dist/recorder/recorder.js +0 -101
  108. package/dist/recorder/recorder.js.map +0 -1
  109. package/dist/recorder/store.d.ts +0 -22
  110. package/dist/recorder/store.d.ts.map +0 -1
  111. package/dist/recorder/store.js +0 -150
  112. package/dist/recorder/store.js.map +0 -1
  113. package/dist/recorder/types.d.ts +0 -73
  114. package/dist/recorder/types.d.ts.map +0 -1
  115. package/dist/recorder/types.js +0 -5
  116. package/dist/recorder/types.js.map +0 -1
@@ -0,0 +1,188 @@
1
+ # Mobile Remote Control (Viewer Mode)
2
+
3
+ ## Overview
4
+
5
+ When the agent-browser viewer is opened on a **touch device** (phone, tablet), it automatically enters **mobile mode** with a touch-optimized UI. This is distinct from iOS Simulator mode — it works on ANY phone/tablet browser via the web viewer, requiring no simulator installation.
6
+
7
+ ## Touchpad System
8
+
9
+ The touchpad occupies the bottom portion of the viewer screen and simulates mouse input on the remote browser:
10
+
11
+ | Gesture | Action | Visual Feedback |
12
+ | ------------------- | ------------------------------------ | --------------------------------------- |
13
+ | Single tap | Click at virtual cursor position | Cursor flashes red briefly |
14
+ | Single finger drag | Move virtual cursor on remote screen | Cursor follows finger |
15
+ | Long press (~800ms) | Enter drag mode (hold mouse down) | Cursor turns orange, shows "DRAG" badge |
16
+ | Two-finger drag | Scroll wheel (vertical/horizontal) | Shows "SCROLL" badge |
17
+ | Two-finger release | Momentum scroll (deceleration) | Smooth deceleration after release |
18
+
19
+ **Implementation details:**
20
+
21
+ - All touch listeners use `{ passive: false }` + `preventDefault()` to prevent browser gestures
22
+ - Movement uses acceleration curve for natural feel (`computeAcceleration()`)
23
+ - Scroll uses separate wheel acceleration (`computeWheelAccel()`)
24
+ - Cooldown period after two-finger scroll prevents accidental clicks
25
+ - Momentum scroll uses RAF loop with 0.92 decay factor
26
+
27
+ ## Virtual Keyboard Toolbar
28
+
29
+ Collapsible toolbar at the top of the touchpad area:
30
+
31
+ | Button | Key Sent | Code |
32
+ | ----------- | ----------- | ------------ |
33
+ | Tab | Tab | `Tab` |
34
+ | Up Arrow | Arrow Up | `ArrowUp` |
35
+ | Left Arrow | Arrow Left | `ArrowLeft` |
36
+ | Down Arrow | Arrow Down | `ArrowDown` |
37
+ | Right Arrow | Arrow Right | `ArrowRight` |
38
+ | Enter | Enter | `Enter` |
39
+ | Backspace | Backspace | `Backspace` |
40
+ | Escape | Escape | `Escape` |
41
+
42
+ - **Collapsed state** (default): Shows only expand button (+ icon)
43
+ - **Expanded state**: Shows all 8 keys in wrapped layout
44
+ - Tap any key to send immediately to remote browser (no need to switch to keyboard app)
45
+
46
+ ## Text Input (Input Panel)
47
+
48
+ This is the key innovation for mobile remote control — typing text into remote input fields from your phone.
49
+
50
+ ### Flow Diagram
51
+
52
+ ```
53
+ User taps remote <input> on viewer screen
54
+
55
+ Daemon detects focus event via injected listener
56
+
57
+ Daemon sends {type: "input_focused", value: "...", ...} to viewer
58
+
59
+ Viewer enters INPUT MODE:
60
+ - Hides virtual cursor
61
+ - Shows #input-panel at screen bottom
62
+ - Pre-fills local input field with current value
63
+ - Sets window._currentTargetSelector for fill targeting
64
+
65
+ User types in local input field (with IME if needed)
66
+
67
+ Text syncs to remote via {type: "input_fill", text: "...", selector: "..."}
68
+
69
+ User taps Send (arrow icon) or presses Enter:
70
+ - Sends final input_fill + Enter keydown/keyup
71
+ - Exits input mode
72
+
73
+ OR user taps Escape or clicks outside panel:
74
+ - Sends input_blur_element to remote
75
+ - Exits input mode, restores touchpad
76
+ ```
77
+
78
+ ### IME / CJK Composition Support
79
+
80
+ Critical for Chinese, Japanese, Korean input methods:
81
+
82
+ | Event | Handling | Prevents |
83
+ | ------------------------ | -------------------------------------------------------- | ------------------------------------------- |
84
+ | `compositionstart` | Sets `_fieldComposing = true` | Intermediate pinyin sent to remote |
85
+ | `compositionupdate` | (ignored while composing) | Garbage characters |
86
+ | `compositionend` | Sets `_fieldComposing = false`, double-RAF deferred sync | Partial commits sent early |
87
+ | RAF poll (30ms interval) | Skips sync while `_fieldComposing === true` | Race condition with IME candidate selection |
88
+
89
+ **Key insight:** Only fully committed characters (after user selects from IME candidate list) are synced to the remote browser. Intermediate pinyin/kana composition is completely filtered out.
90
+
91
+ ### Input Panel Layout
92
+
93
+ ```
94
+ ┌─────────────────────────────────────────┐
95
+ │ target: input[type="email"] │ <- label row
96
+ ├─────────────────────────────────────────┤
97
+ │ [________________________] [>] │ <- input + send button
98
+ └─────────────────────────────────────────┘
99
+ ```
100
+
101
+ - Label shows: input type + placeholder (if different from value)
102
+ - Input field: `border-radius: 18px`, `font-size:16px` (prevents iOS zoom)
103
+ - Send button: Blue circle with arrow SVG icon
104
+ - Dismissal: Tap outside panel or press Escape
105
+
106
+ ### Keyboard Awareness on Mobile
107
+
108
+ On mobile devices, the viewer intentionally suppresses keyboard-related events to prevent interference:
109
+
110
+ - `hiddenInput` (#hidden-input) is **NOT created** on touch devices (unlike desktop mode)
111
+ - Document-level `keydown`/`keyup` listeners check `event.target` — ignores events from `#input-field`
112
+ - This allows the native mobile keyboard to work normally for text input without conflicting with remote keyboard forwarding
113
+
114
+ ## DeviceMode Dynamic Switching
115
+
116
+ The viewer does NOT detect device type once at startup. It uses a reactive architecture that can switch at runtime:
117
+
118
+ ### Detection Function
119
+
120
+ ```javascript
121
+ function detectDeviceMode() {
122
+ var uaMatch = /iphone|ipod|android(?=.*mobile)|mobile|tablet|ipad/i.test(ua);
123
+ var hasTouch = 'ontouchstart' in window || navigator.maxTouchPoints > 0;
124
+ return uaMatch || hasTouch ? 'mobile' : 'desktop';
125
+ }
126
+ ```
127
+
128
+ ### Singleton Architecture
129
+
130
+ ```javascript
131
+ const DeviceMode = {
132
+ _current: detectDeviceMode(), // Initial detection
133
+ _listeners: [], // Change callbacks
134
+
135
+ get current() {
136
+ return this._current;
137
+ },
138
+
139
+ onModeChange(fn) {
140
+ this._listeners.push(fn);
141
+ },
142
+
143
+ switchTo(mode) {
144
+ if (mode === this._current) return; // No-op for same mode
145
+ var prev = this._current;
146
+ this._current = mode;
147
+ if (mode === 'desktop') {
148
+ MobileModule.detach(); // Hide touchpad, show cursor
149
+ DesktopModule.attach(); // Create hiddenInput, focus it
150
+ } else {
151
+ DesktopModule.detach(); // Remove hiddenInput
152
+ MobileModule.attach(); // Show touchpad, init cursor
153
+ }
154
+ this._listeners.forEach((fn) => fn(mode, prev));
155
+ },
156
+ };
157
+ ```
158
+
159
+ ### Module Lifecycle
160
+
161
+ **DesktopModule** (PC mode):
162
+
163
+ - `attach()`: Creates invisible `#hidden-input`, focuses it (captures keyboard for remote forwarding)
164
+ - `detach()`: Blurs and removes hiddenInput
165
+
166
+ **MobileModule** (touch mode):
167
+
168
+ - `attach()`: Shows touchpad (display:flex), initializes virtual cursor, sets up toolbar
169
+ - `detach()`: Hides input-panel, shows cursor again
170
+
171
+ ### Auto-Switching Triggers
172
+
173
+ | Trigger | Handler | Use Case |
174
+ | --------------------------------------- | ------------------------- | ------------------------------------------------- |
175
+ | `resize` event | Debounced 100ms re-detect | Phone rotation, window resize |
176
+ | `orientationchange` | Delayed 200ms re-detect | Portrait<->Landscape |
177
+ | `matchMedia("(pointer:coarse)")` change | Immediate switch | Stylus connect/disconnect, tablet keyboard attach |
178
+
179
+ ## Mobile-Specific CSS Considerations
180
+
181
+ | Issue | Solution |
182
+ | -------------------------------------- | --------------------------------------------------------- |
183
+ | iOS keyboard pushes content up | `min/max-height: 100dvh` on html/body, `position: fixed` |
184
+ | VisualViewport API for keyboard height | Listener resizes input panel above keyboard |
185
+ | iOS auto-scroll during input | `setInterval` scroll guard (100ms) fights browser scroll |
186
+ | Browser gesture conflicts | `touch-action: none` on body during input mode |
187
+ | Safe area (notch phones) | `padding-bottom: env(safe-area-inset-bottom)` on touchpad |
188
+ | Small tap targets | Minimum 44px height on buttons (iOS guideline) |
@@ -0,0 +1,232 @@
1
+ # Network Request Monitoring
2
+
3
+ The `network` command provides powerful network interception and monitoring capabilities for testing APIs, blocking unwanted requests, mocking responses, and debugging network behavior.
4
+
5
+ ## Basic Network Monitoring
6
+
7
+ ### View All Network Requests
8
+
9
+ ```bash
10
+ # Start monitoring network requests
11
+ agent-browser network requests
12
+
13
+ # Clear request history
14
+ agent-browser network requests --clear
15
+
16
+ # Filter requests by URL pattern
17
+ agent-browser network requests --filter "**/api/**"
18
+ agent-browser network requests --filter "**/json"
19
+ ```
20
+
21
+ ### Example: Monitor API Calls
22
+
23
+ ```bash
24
+ # Open a page
25
+ agent-browser open https://httpbin.org/delay/1
26
+
27
+ # View all network requests made
28
+ agent-browser network requests
29
+
30
+ # Filter to see only JSON responses
31
+ agent-browser network requests --filter "**/json"
32
+ ```
33
+
34
+ ## Request Interception (Routing)
35
+
36
+ ### Mock API Responses
37
+
38
+ ```bash
39
+ # Set up a mock response for a URL pattern
40
+ agent-browser network route "**/api/users" --body '{"users": [{"id": 1, "name": "Mock User"}]}'
41
+
42
+ # Now any request to /api/users will return the mock data
43
+ agent-browser open https://example.com
44
+
45
+ # Remove the route
46
+ agent-browser network unroute "**/api/users"
47
+ ```
48
+
49
+ ### Block Unwanted Requests
50
+
51
+ ```bash
52
+ # Block ads or tracking scripts
53
+ agent-browser network route "**/ads/**" --abort
54
+ agent-browser network route "**/tracking/**" --abort
55
+
56
+ # Block specific domains
57
+ agent-browser network route "**/analytics.google.com/**" --abort
58
+
59
+ # Remove block
60
+ agent-browser network unroute "**/ads/**"
61
+ ```
62
+
63
+ ## Recording Network Activity
64
+
65
+ ### During Recorder Session
66
+
67
+ ```bash
68
+ # Start recording session
69
+ agent-browser recorder start --session network-test
70
+
71
+ # Navigate and perform actions
72
+ agent-browser open https://httpbin.org/get
73
+ agent-browser open https://httpbin.org/json
74
+
75
+ # View network requests during recording
76
+ agent-browser network requests
77
+ agent-browser network requests --filter "**/json"
78
+
79
+ # Stop recording
80
+ agent-browser recorder stop --output network-test.yaml
81
+ ```
82
+
83
+ ## Advanced Patterns
84
+
85
+ ### Debug API Issues
86
+
87
+ ```bash
88
+ # 1. Clear previous requests
89
+ agent-browser network requests --clear
90
+
91
+ # 2. Navigate to trigger API calls
92
+ agent-browser open https://example.com/dashboard
93
+
94
+ # 3. Check what requests were made
95
+ agent-browser network requests
96
+
97
+ # 4. Filter for specific endpoints
98
+ agent-browser network requests --filter "**/api/v1/**"
99
+ ```
100
+
101
+ ### Test Error Handling
102
+
103
+ ```bash
104
+ # Mock error responses
105
+ agent-browser network route "**/api/critical" --body '{"error": "Service unavailable"}'
106
+
107
+ # Or block the request entirely
108
+ agent-browser network route "**/api/critical" --abort
109
+
110
+ # Test how your app handles the error
111
+ agent-browser open https://example.com
112
+ ```
113
+
114
+ ### Performance Testing
115
+
116
+ ```bash
117
+ # Monitor requests while testing
118
+ agent-browser network requests --clear
119
+
120
+ # Perform actions
121
+ agent-browser click @e1
122
+ agent-browser wait --load networkidle
123
+
124
+ # Check how many requests were made
125
+ agent-browser network requests
126
+ ```
127
+
128
+ ## URL Pattern Matching
129
+
130
+ The routing uses glob patterns:
131
+
132
+ - `**/api/**` - Match any path containing /api/
133
+ - `**/api/users` - Match specific endpoint
134
+ - `**/*.json` - Match all JSON files
135
+ - `https://example.com/**` - Match specific domain
136
+ - `**/ads/**` - Match any ad URLs
137
+
138
+ ## Integration with Recorder
139
+
140
+ Network monitoring works seamlessly with the recorder:
141
+
142
+ ```bash
143
+ # Start recording with network monitoring
144
+ agent-browser recorder start --session my-test
145
+
146
+ # Your workflow
147
+ agent-browser open https://example.com
148
+ agent-browser snapshot -i
149
+ agent-browser click @e1
150
+
151
+ # Check network requests
152
+ agent-browser network requests
153
+
154
+ # Stop and save
155
+ agent-browser recorder stop --output test-with-network.yaml
156
+ ```
157
+
158
+ ## Best Practices
159
+
160
+ 1. **Clear before testing**: Use `--clear` to start fresh
161
+ ```bash
162
+ agent-browser network requests --clear
163
+ ```
164
+
165
+ 2. **Filter effectively**: Use specific patterns to reduce noise
166
+ ```bash
167
+ agent-browser network requests --filter "**/api/v2/**"
168
+ ```
169
+
170
+ 3. **Clean up routes**: Always remove test routes
171
+ ```bash
172
+ agent-browser network unroute "**/test/**"
173
+ ```
174
+
175
+ 4. **Combine with wait**: Use network idle for comprehensive testing
176
+ ```bash
177
+ agent-browser click @e1
178
+ agent-browser wait --load networkidle
179
+ agent-browser network requests
180
+ ```
181
+
182
+ ## Use Cases
183
+
184
+ - **API Testing**: Mock responses and test error handling
185
+ - **Performance**: Monitor request count and patterns
186
+ - **Debugging**: See what requests your app makes
187
+ - **Ad Blocking**: Block unwanted requests during testing
188
+ - **Offline Testing**: Block external dependencies
189
+ - **Security**: Audit what data is being sent
190
+
191
+ ## Example Test Script
192
+
193
+ ```bash
194
+ #!/bin/bash
195
+
196
+ # Test network monitoring with httpbin.org
197
+
198
+ # Start browser
199
+ agent-browser open https://httpbin.org
200
+
201
+ # Clear previous requests
202
+ agent-browser network requests --clear
203
+
204
+ # Make some requests
205
+ agent-browser open https://httpbin.org/get
206
+ agent-browser open https://httpbin.org/json
207
+ agent-browser open https://httpbin.org/html
208
+
209
+ # Check all requests
210
+ agent-browser network requests
211
+
212
+ # Filter JSON requests
213
+ agent-browser network requests --filter "**/json"
214
+
215
+ # Test mocking
216
+ agent-browser network route "**/test" --body '{"mocked": true}'
217
+ agent-browser open https://httpbin.org/test
218
+ agent-browser network unroute "**/test"
219
+
220
+ # Test blocking
221
+ agent-browser network route "**/blocked" --abort
222
+
223
+ # Clean up
224
+ agent-browser close
225
+ ```
226
+
227
+ ## Limitations
228
+
229
+ - Routes are session-specific and reset on browser close
230
+ - Request history is stored in memory and cleared on browser close
231
+ - Mock responses only work for simple JSON bodies
232
+ - For complex mocking, consider using a dedicated API mocking service