ttp-agent-sdk 2.30.0 → 2.32.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -286,6 +286,130 @@ const voiceSDK = new VoiceSDK({
286
286
  }
287
287
  ```
288
288
 
289
+ ## Capture Screenshot Tool (`capture_screen`)
290
+
291
+ The `capture_screen` tool allows AI agents to capture screenshots of the browser page during conversations. It uses `html2canvas` to render the DOM as an image.
292
+
293
+ ### Capturing the Entire Screen
294
+
295
+ When capturing the entire screen, the tool has two modes:
296
+
297
+ 1. **Viewport only** (default) - Captures only the visible portion of the page (`document.body`)
298
+ 2. **Full scrollable page** - When `fullPage: true`, captures the entire page including all content below the fold
299
+
300
+ **Example: Capture entire visible viewport**
301
+ ```javascript
302
+ // The agent can call this tool with:
303
+ {
304
+ tool: "capture_screen",
305
+ params: {
306
+ format: "jpeg", // 'png' or 'jpeg' (jpeg is smaller)
307
+ quality: 0.85, // JPEG quality 0-1 (only for jpeg)
308
+ scale: 1, // Resolution scale (1 = normal, 2 = retina)
309
+ maxWidth: 1280, // Max width in pixels (auto-resizes if larger)
310
+ maxHeight: 1280 // Max height in pixels (auto-resizes if larger)
311
+ }
312
+ }
313
+ ```
314
+
315
+ **Example: Capture full scrollable page**
316
+ ```javascript
317
+ // To capture the entire page including content below the fold:
318
+ {
319
+ tool: "capture_screen",
320
+ params: {
321
+ fullPage: true, // Capture entire scrollable page
322
+ format: "jpeg",
323
+ quality: 0.85,
324
+ maxWidth: 1920, // Higher limits for full page
325
+ maxHeight: 5000 // Accommodate long pages
326
+ }
327
+ }
328
+ ```
329
+
330
+ **How Full Page Capture Works:**
331
+
332
+ When `fullPage: true` is set (and no `selector` is provided), the tool:
333
+ - Sets `windowHeight` and `height` to `document.documentElement.scrollHeight` (total page height)
334
+ - Sets `y: 0` to start from the top
335
+ - Captures the entire scrollable content, not just the visible viewport
336
+ - The resulting image height equals the full scrollable height of the page
337
+
338
+ **Example: Capture specific element**
339
+ ```javascript
340
+ // Capture a specific element using CSS selector:
341
+ {
342
+ tool: "capture_screen",
343
+ params: {
344
+ selector: "#my-element", // CSS selector (e.g., "#header", ".content", "main")
345
+ format: "png", // PNG preserves transparency
346
+ scale: 2 // 2x resolution for crisp screenshots
347
+ }
348
+ }
349
+ ```
350
+
351
+ #### Tool Parameters
352
+
353
+ | Parameter | Type | Default | Description |
354
+ |-----------|------|---------|-------------|
355
+ | `selector` | string | `null` | CSS selector for specific element. If not provided, captures entire page. |
356
+ | `format` | string | `'jpeg'` | Output format: `'png'` or `'jpeg'`. JPEG is smaller, PNG supports transparency. |
357
+ | `quality` | number | `0.85` | JPEG quality (0-1). Only used when `format` is `'jpeg'`. |
358
+ | `scale` | number | `1` | Resolution scale factor. `1` = normal, `2` = retina/2x resolution. |
359
+ | `maxWidth` | number | `1280` | Maximum width in pixels. Image will be resized if larger (maintains aspect ratio). |
360
+ | `maxHeight` | number | `1280` | Maximum height in pixels. Image will be resized if larger (maintains aspect ratio). |
361
+ | `fullPage` | boolean | `false` | When `true` and no `selector` provided, captures entire scrollable page (not just viewport). |
362
+
363
+ #### Return Value
364
+
365
+ The tool returns a result object with:
366
+
367
+ ```javascript
368
+ {
369
+ image: "base64_encoded_image_data", // Base64 string (without data URI prefix)
370
+ mimeType: "image/jpeg", // MIME type: "image/jpeg" or "image/png"
371
+ width: 1280, // Final image width in pixels
372
+ height: 720, // Final image height in pixels
373
+ captureTimeMs: 234, // Time taken to capture in milliseconds
374
+ selector: "body" // Element that was captured (or selector used)
375
+ }
376
+ ```
377
+
378
+ #### Events
379
+
380
+ The SDK emits events when screenshots are captured:
381
+
382
+ ```javascript
383
+ voiceSDK.on('screenshotCaptured', (data) => {
384
+ console.log(`Screenshot captured: ${data.width}x${data.height}, ${data.sizeKB}KB`);
385
+ console.log(`Element: ${data.selector}`);
386
+ });
387
+
388
+ voiceSDK.on('screenshotError', (error) => {
389
+ console.error('Screenshot failed:', error.error);
390
+ });
391
+ ```
392
+
393
+ #### How It Works Internally
394
+
395
+ The tool uses `html2canvas` to render the DOM:
396
+
397
+ 1. **Target Selection**: If `selector` is provided, captures that element. Otherwise captures `document.body`.
398
+ 2. **Full Page Mode**: When `fullPage: true` and no selector:
399
+ - Sets canvas height to `document.documentElement.scrollHeight`
400
+ - Captures from `y: 0` to capture entire scrollable area
401
+ 3. **Rendering**: `html2canvas` renders the DOM to a canvas element
402
+ 4. **Resizing**: If dimensions exceed `maxWidth`/`maxHeight`, image is resized maintaining aspect ratio
403
+ 5. **Encoding**: Canvas is converted to base64 (JPEG or PNG format)
404
+
405
+ #### Important Notes
406
+
407
+ - **Full Page Capture**: When `fullPage: true` is set, the tool captures the entire scrollable height (`document.documentElement.scrollHeight`), not just the visible viewport. This includes all content below the fold.
408
+ - **Performance**: Full page captures may take longer, especially for very long pages. Consider using `maxHeight` to limit the capture size.
409
+ - **Image Size**: Screenshots are automatically resized if they exceed `maxWidth` or `maxHeight` while maintaining aspect ratio.
410
+ - **Cross-Origin**: The tool handles cross-origin images when possible (`useCORS: true`).
411
+ - **Browser Compatibility**: Requires modern browsers with Canvas API support (Chrome 66+, Firefox 60+, Safari 11.1+, Edge 79+).
412
+
289
413
  ## Examples
290
414
 
291
415
  See the `examples/` directory for complete usage examples: