npm - @telnyx/voice-agent-tester - Versions diffs - 0.4.1 → 0.4.4 - Mend

@telnyx/voice-agent-tester 0.4.1 → 0.4.4

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (9) hide show

package/CHANGELOG.md +30 -0
package/README.md +28 -1
package/applications/telnyx.yaml +4 -3
package/applications/vapi.yaml +0 -4
package/javascript/audio_input_hooks.js +89 -19
package/package.json +1 -1
package/src/index.js +63 -18
package/src/voice-agent-tester.js +224 -4
package/tests/integration.test.js +4 -3

package/CHANGELOG.md CHANGED Viewed

@@ -1,5 +1,35 @@
 # Changelog
+## [0.4.4](https://github.com/team-telnyx/voice-agent-tester/compare/v0.4.3...v0.4.4) (2026-03-11)
+### Features
+* fix speechend race condition, add --retries flag ([#21](https://github.com/team-telnyx/voice-agent-tester/issues/21)) ([09e3b65](https://github.com/team-telnyx/voice-agent-tester/commit/09e3b6578face6c407d058991ab5495d9463e544))
+### Chores
+* release v0.4.3 ([#20](https://github.com/team-telnyx/voice-agent-tester/issues/20)) ([bdeb87b](https://github.com/team-telnyx/voice-agent-tester/commit/bdeb87bed502919a9fed9950e69242b1c2aefcfc))
+## [0.4.3](https://github.com/team-telnyx/voice-agent-tester/compare/v0.4.2...v0.4.3) (2026-03-11)
+### Features
+* add click_with_retry action and fix audio event race conditions ([#19](https://github.com/team-telnyx/voice-agent-tester/issues/19)) ([#19](https://github.com/team-telnyx/voice-agent-tester/issues/19)) ([13e2009](https://github.com/team-telnyx/voice-agent-tester/commit/13e2009a94b4e2f7e05972f01a47c9b31758bf58))
+### Chores
+* release v0.4.2 ([#18](https://github.com/team-telnyx/voice-agent-tester/issues/18)) ([1cf64ef](https://github.com/team-telnyx/voice-agent-tester/commit/1cf64ef563e813c2f06b2b655bfcc414637594cb))
+## [0.4.2](https://github.com/team-telnyx/voice-agent-tester/compare/v0.4.1...v0.4.2) (2026-02-23)
+### Features
+* add dashboard hints for Vapi and ElevenLabs comparison mode params ([#16](https://github.com/team-telnyx/voice-agent-tester/issues/16)) ([7fda40b](https://github.com/team-telnyx/voice-agent-tester/commit/7fda40b6971a968dde1fc1c3466662227a3bc77e))
+### Chores
+* improve event logs and comparison mode docs ([#17](https://github.com/team-telnyx/voice-agent-tester/issues/17)) ([24a9683](https://github.com/team-telnyx/voice-agent-tester/commit/24a968337a0b4a6c2d6baddd0aa507d5a87c9488))
 ## [0.4.1](https://github.com/team-telnyx/voice-agent-tester/compare/v0.4.0...v0.4.1) (2026-02-18)
 ### Features

package/README.md CHANGED Viewed

@@ -31,6 +31,8 @@ voice-agent-tester -a applications/telnyx.yaml -s scenarios/appointment.yaml --a
 | `--provider` | | Import from provider (`vapi`, `elevenlabs`, `retell`) |
 | `--provider-api-key` | | External provider API key (required with `--provider`) |
 | `--provider-import-id` | | Provider assistant ID to import (required with `--provider`) |
+| `--share-key` | | Vapi share key for comparison mode (prompted if missing) |
+| `--branch-id` | | ElevenLabs branch ID for comparison mode (prompted if missing) |
 | `--compare` | `true` | Run both provider direct and Telnyx import benchmarks |
 | `--no-compare` | | Disable comparison (run only Telnyx import) |
 | `-d, --debug` | `false` | Enable detailed timeout diagnostics |
@@ -190,20 +192,45 @@ When importing from an external provider, the tool automatically runs both bench
 1. **Provider Direct** - Benchmarks the assistant on the original provider's widget
 2. **Telnyx Import** - Benchmarks the same assistant after importing to Telnyx
+### Provider-Specific Keys
+Comparison mode requires a provider-specific key to load the provider's direct widget. If not passed via CLI, the tool will prompt you with instructions on how to find it.
+| Provider | Flag | How to find it |
+|----------|------|----------------|
+| Vapi | `--share-key` | In the Vapi Dashboard, select your assistant, then click the link icon (🔗) next to the assistant ID at the top. This copies the demo link containing your share key. |
+| ElevenLabs | `--branch-id` | In the ElevenLabs Dashboard, go to Agents, select your target agent, then click the dropdown next to Publish and select "Copy shareable link". This copies the demo link containing your branch ID. |
 ### Import and Compare (Default)
+**Vapi:**
 ```bash
 npx @telnyx/voice-agent-tester@latest \
   -a applications/telnyx.yaml \
   -s scenarios/appointment.yaml \
   --provider vapi \
+  --share-key <VAPI_SHARE_KEY> \
   --api-key <TELNYX_KEY> \
   --provider-api-key <VAPI_KEY> \
   --provider-import-id <VAPI_ASSISTANT_ID>
 ```
+**ElevenLabs:**
+```bash
+npx @telnyx/voice-agent-tester@latest \
+  -a applications/telnyx.yaml \
+  -s scenarios/appointment.yaml \
+  --provider elevenlabs \
+  --branch-id <ELEVENLABS_BRANCH_ID> \
+  --api-key <TELNYX_KEY> \
+  --provider-api-key <ELEVENLABS_KEY> \
+  --provider-import-id <ELEVENLABS_AGENT_ID>
+```
 This will:
-- Run Phase 1: VAPI direct benchmark
+- Run Phase 1: Provider direct benchmark
 - Run Phase 2: Telnyx import benchmark
 - Generate a side-by-side latency comparison report

package/applications/telnyx.yaml CHANGED Viewed

@@ -4,7 +4,8 @@ steps:
     selector: "telnyx-ai-agent"
   - action: sleep
     time: 3000
-  - action: click
+  - action: click_with_retry
     selector: "telnyx-ai-agent >>> button"
-  - action: sleep
-    time: 4000
+    retries: 5
+    checkDelay: 4000
+    retryDelay: 5000

package/applications/vapi.yaml CHANGED Viewed

@@ -13,7 +13,3 @@ steps:
     time: 2000
   - action: speak
     text: "Hello, what can you do?"
-  - action: wait_for_voice
-    metrics: elapsed_time
-  - action: wait_for_silence
-    metrics: elapsed_time

package/javascript/audio_input_hooks.js CHANGED Viewed

@@ -62,20 +62,24 @@ function createControlledMediaStream() {
 }
 // Replace getUserMedia to return our controlled stream
-const originalGetUserMedia = navigator.mediaDevices.getUserMedia.bind(navigator.mediaDevices);
-navigator.mediaDevices.getUserMedia = function (constraints) {
-  console.log("🎤 Intercepted getUserMedia call with constraints:", constraints);
-  // If audio is requested, return our controlled stream
-  if (constraints && constraints.audio) {
-    console.log("🎤 Returning controlled MediaStream instead of real microphone");
-    const controlledStream = createControlledMediaStream();
-    return Promise.resolve(controlledStream);
-  }
+if (navigator.mediaDevices && navigator.mediaDevices.getUserMedia) {
+  const originalGetUserMedia = navigator.mediaDevices.getUserMedia.bind(navigator.mediaDevices);
+  navigator.mediaDevices.getUserMedia = function (constraints) {
+    console.log("🎤 Intercepted getUserMedia call with constraints:", constraints);
+    // If audio is requested, return our controlled stream
+    if (constraints && constraints.audio) {
+      console.log("🎤 Returning controlled MediaStream instead of real microphone");
+      const controlledStream = createControlledMediaStream();
+      return Promise.resolve(controlledStream);
+    }
-  // For video-only or other requests, use original implementation
-  return originalGetUserMedia(constraints);
-};
+    // For video-only or other requests, use original implementation
+    return originalGetUserMedia(constraints);
+  };
+} else {
+  console.warn("🎤 navigator.mediaDevices.getUserMedia not available, skipping microphone intercept");
+}
 // Expose __speak method to be called from voice-agent-tester.js
 window.__speak = function (textOrUrl) {
@@ -152,6 +156,24 @@ function playAudioInMediaStream(url) {
   const audio = new Audio(url);
   audio.crossOrigin = 'anonymous'; // Enable CORS if needed
+  // Keep a strong reference so the element is not garbage collected
+  currentSpeakAudio = audio;
+  let speechEndFired = false;
+  let safetyTimeoutId = null;
+  function fireSpeechEnd(reason) {
+    if (speechEndFired) return;
+    speechEndFired = true;
+    if (safetyTimeoutId) clearTimeout(safetyTimeoutId);
+    console.log(`🎤 Audio playback ended (${reason})`);
+    if (typeof __publishEvent === 'function') {
+      __publishEvent('speechend', { url: url, reason: reason });
+    }
+    // Release reference
+    if (currentSpeakAudio === audio) currentSpeakAudio = null;
+  }
   // Set up audio routing through all MediaStreams
   audio.addEventListener('canplaythrough', function () {
     console.log(`🎤 Audio ready to play, routing to ${mediaStreams.length} MediaStreams`);
@@ -181,7 +203,33 @@ function playAudioInMediaStream(url) {
       }
       // Play the audio
-      audio.play();
+      audio.play().then(() => {
+        // Set up safety timeout based on audio duration
+        // audio.duration should be available after canplaythrough
+        const duration = audio.duration;
+        if (duration && isFinite(duration)) {
+          const safetyMs = Math.max((duration * 1000) + 5000, 15000);
+          console.log(`🎤 Audio duration: ${duration.toFixed(1)}s, safety timeout: ${(safetyMs / 1000).toFixed(1)}s`);
+          safetyTimeoutId = setTimeout(() => {
+            if (!speechEndFired) {
+              console.warn(`🎤 Safety timeout: speechend not fired after ${(safetyMs / 1000).toFixed(1)}s (audio paused=${audio.paused}, ended=${audio.ended}, currentTime=${audio.currentTime.toFixed(1)})`);
+              fireSpeechEnd('safety_timeout');
+            }
+          }, safetyMs);
+        } else {
+          // Unknown duration — use 20s fallback
+          console.warn('🎤 Audio duration unknown, using 20s safety timeout');
+          safetyTimeoutId = setTimeout(() => {
+            if (!speechEndFired) {
+              console.warn('🎤 Safety timeout: speechend not fired after 20s');
+              fireSpeechEnd('safety_timeout');
+            }
+          }, 20000);
+        }
+      }).catch(error => {
+        console.error('Error playing audio:', error);
+        fireSpeechEnd('play_error');
+      });
     } catch (error) {
       console.error('Error setting up audio source:', error);
       if (typeof __publishEvent === 'function') {
@@ -190,11 +238,19 @@ function playAudioInMediaStream(url) {
     }
   });
-  // Handle audio end
+  // Handle audio end — primary path
   audio.addEventListener('ended', function () {
-    console.log('🎤 Audio playback ended');
-    if (typeof __publishEvent === 'function') {
-      __publishEvent('speechend', { url: url });
+    fireSpeechEnd('ended');
+  });
+  // Handle pause — if something pauses the audio externally
+  audio.addEventListener('pause', function () {
+    // Only treat as speechend if the audio is past 90% of its duration (near end)
+    // or if it was paused externally (not by us)
+    if (audio.ended || (audio.duration && audio.currentTime >= audio.duration * 0.9)) {
+      fireSpeechEnd('pause_near_end');
+    } else {
+      console.warn(`🎤 Audio paused at ${audio.currentTime.toFixed(1)}s / ${(audio.duration || 0).toFixed(1)}s`);
     }
   });
@@ -204,17 +260,31 @@ function playAudioInMediaStream(url) {
     if (typeof __publishEvent === 'function') {
       __publishEvent('speecherror', { error: 'Audio playback failed', url: url });
     }
+    fireSpeechEnd('error');
   });
   // Start loading the audio
   audio.load();
 }
+// Keep a reference to the current speak Audio element so it doesn't get GC'd
+let currentSpeakAudio = null;
 // Helper function to stop current audio and reset to silence
 function stopCurrentAudio() {
+  // Stop the speak audio element if playing
+  if (currentSpeakAudio) {
+    try {
+      currentSpeakAudio.pause();
+      currentSpeakAudio.currentTime = 0;
+    } catch (e) {
+      console.warn('Error stopping speak audio:', e);
+    }
+    currentSpeakAudio = null;
+  }
   currentPlaybackNodes.forEach((sourceNode, index) => {
     try {
-      sourceNode.stop();
       sourceNode.disconnect();
       console.log(`🎤 Stopped audio source ${index}`);
     } catch (e) {

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "@telnyx/voice-agent-tester",
-  "version": "0.4.1",
+  "version": "0.4.4",
   "description": "A command-line tool to test voice agents using Puppeteer",
   "main": "src/index.js",
   "type": "module",

package/src/index.js CHANGED Viewed

@@ -100,12 +100,22 @@ function getCompareRequiredParams(argv) {
   switch (argv.provider) {
     case 'vapi':
       if (!argv.shareKey) {
-        missing.push({ key: 'shareKey', flag: '--share-key', description: 'Vapi share key' });
+        missing.push({
+          key: 'shareKey',
+          flag: '--share-key',
+          description: 'Vapi share key',
+          hint: 'In the Vapi Dashboard, select your assistant, then click the link icon (🔗) next to the assistant ID at the top. This copies the demo link containing your share key.'
+        });
       }
       break;
     case 'elevenlabs':
       if (!argv.branchId) {
-        missing.push({ key: 'branchId', flag: '--branch-id', description: 'ElevenLabs branch ID' });
+        missing.push({
+          key: 'branchId',
+          flag: '--branch-id',
+          description: 'ElevenLabs branch ID',
+          hint: 'In the ElevenLabs Dashboard, go to Agents, select your target agent, then click the dropdown next to Publish and select "Copy shareable link". This copies the demo link containing your branch ID.'
+        });
       }
       break;
     // retell and others: no extra params needed yet
@@ -317,6 +327,11 @@ const argv = yargs(hideBin(process.argv))
     description: 'Volume level for audio input (0.0 to 1.0)',
     default: 1.0
   })
+  .option('retries', {
+    type: 'number',
+    description: 'Number of retries for failed test runs (0 = no retries)',
+    default: 0
+  })
   .help()
   .argv;
@@ -399,22 +414,49 @@ async function runBenchmark({ applications, scenarios, repeat, concurrency, argv
       audioVolume: argv.audioVolume
     });
-    try {
-      await tester.runScenario(targetUrl, app.steps, scenario.steps, app.name, scenario.name, repetition);
-      console.log(`✅ Completed successfully (Run ${runNumber}/${totalRuns})`);
-      return { success: true };
-    } catch (error) {
-      // Store only the first line for summary, but print full message here (with diagnostics)
-      const shortMessage = error.message.split('\n')[0];
-      const errorInfo = {
-        app: app.name,
-        scenario: scenario.name,
-        repetition,
-        error: shortMessage
-      };
-      // Print full diagnostics here (only place they appear)
-      console.error(`❌ Error (Run ${runNumber}/${totalRuns}):\n${error.message}`);
-      return { success: false, error: errorInfo };
+    const maxAttempts = (argv.retries || 0) + 1;
+    for (let attempt = 1; attempt <= maxAttempts; attempt++) {
+      // Create a fresh tester for each attempt (after first, original tester is closed)
+      const currentTester = attempt === 1 ? tester : new VoiceAgentTester({
+        verbose: argv.verbose,
+        headless: argv.headless,
+        assetsServerUrl: argv.assetsServer,
+        reportGenerator: reportGenerator,
+        record: argv.record,
+        debug: argv.debug,
+        audioUrl: argv.audioUrl,
+        audioVolume: argv.audioVolume
+      });
+      try {
+        await currentTester.runScenario(targetUrl, app.steps, scenario.steps, app.name, scenario.name, repetition);
+        console.log(`✅ Completed successfully (Run ${runNumber}/${totalRuns})`);
+        return { success: true };
+      } catch (error) {
+        const shortMessage = error.message.split('\n')[0];
+        if (attempt < maxAttempts) {
+          console.warn(`\n⚠️ Attempt ${attempt}/${maxAttempts} failed: ${shortMessage}`);
+          console.warn(`🔄 Retrying in 3s... (${maxAttempts - attempt} retries left)\n`);
+          await new Promise(r => setTimeout(r, 3000));
+          continue;
+        }
+        // Final attempt failed
+        const errorInfo = {
+          app: app.name,
+          scenario: scenario.name,
+          repetition,
+          error: shortMessage
+        };
+        // Print full diagnostics here (only place they appear)
+        console.error(`❌ Error (Run ${runNumber}/${totalRuns}):\n${error.message}`);
+        if (maxAttempts > 1) {
+          console.error(`   Failed after ${maxAttempts} attempts`);
+        }
+        return { success: false, error: errorInfo };
+      }
     }
   }
@@ -550,6 +592,9 @@ async function main() {
         if (missingParams.length > 0) {
           for (const param of missingParams) {
             console.log(`\n🔑 ${param.description} is required for comparison mode`);
+            if (param.hint) {
+              console.log(`   ${param.hint}`);
+            }
             const inputVal = await promptUserInput(`Enter ${param.description} (or press Enter to skip comparison): `);
             if (inputVal) {
               argv[param.key] = inputVal;

package/src/voice-agent-tester.js CHANGED Viewed

@@ -238,6 +238,7 @@ export class VoiceAgentTester {
           } else {
             errorMessage += '\n  (Could not collect browser diagnostics)';
           }
         }
         reject(new Error(errorMessage));
@@ -330,7 +331,8 @@ export class VoiceAgentTester {
     await this.page.exposeFunction('__publishEvent', (eventType, data) => {
       const event = { eventType, data, timestamp: Date.now() };
-      console.log(`\t📢 Event received: ${eventType}`);
+      const elementSuffix = data && data.elementId ? ` (audio element: ${data.elementId})` : '';
+      console.log(`\t📢 ${eventType}${elementSuffix}`);
       // Check if there are any pending promises waiting for this event type
       const pendingPromises = this.pendingPromises.get(eventType);
@@ -362,6 +364,7 @@ export class VoiceAgentTester {
         console.error(error.stack);
       }
     });
   }
   async close() {
@@ -534,6 +537,9 @@ export class VoiceAgentTester {
         case 'screenshot':
           handlerResult = await this.handleScreenshot(step);
           break;
+        case 'click_with_retry':
+          handlerResult = await this.handleClickWithRetry(step);
+          break;
         default:
           console.log(`Unknown action: ${action}`);
@@ -576,10 +582,173 @@ export class VoiceAgentTester {
     await this.page.click(selector);
   }
+  async handleClickWithRetry(step) {
+    const selector = step.selector;
+    if (!selector) {
+      throw new Error('No selector specified for click_with_retry action');
+    }
+    const maxRetries = step.retries || 2;
+    const retryDelay = step.retryDelay || 3000;
+    const checkDelay = step.checkDelay || 4000;
+    for (let attempt = 1; attempt <= maxRetries; attempt++) {
+      let clicked = false;
+      try {
+        await this.page.waitForSelector(selector, { timeout: attempt === 1 ? 30000 : 5000 });
+        await this.page.click(selector);
+        clicked = true;
+      } catch {
+        // Selector not found — will check for widget config errors below
+      }
+      if (!clicked) {
+        // Check if the widget is showing a configuration error
+        const widgetState = await this._getWidgetErrorState(selector);
+        if (widgetState.isConfigError) {
+          // Widget is showing "unauthenticated web calls" or similar config error.
+          // This means the API config hasn't propagated to the widget yet.
+          if (attempt < maxRetries) {
+            console.log(`\t⚠️ Click attempt ${attempt}/${maxRetries}: widget not ready — "${widgetState.errorText}"`);
+            console.log(`\t⏳ Waiting for configuration to propagate (reloading in ${retryDelay}ms)...`);
+            await this.sleep(retryDelay);
+            await this.page.reload({ waitUntil: 'networkidle0', timeout: 30000 });
+            await this.sleep(2000); // extra time after reload
+            continue;
+          }
+          throw new Error(
+            `Widget configuration not ready after ${maxRetries} attempts: "${widgetState.errorText}"\n` +
+            `The "Supports Unauthenticated Web Calls" setting may not have propagated yet.\n` +
+            `Try running again in a few seconds, or verify the setting in the Telnyx portal.`
+          );
+        }
+        // Not a config error — genuinely missing selector
+        if (attempt < maxRetries) {
+          console.log(`\t⚠️ Click attempt ${attempt}/${maxRetries}: selector not found, retrying in ${retryDelay}ms...`);
+          await this.sleep(retryDelay);
+          continue;
+        }
+        throw new Error(`Selector "${selector}" not found after ${maxRetries} attempts`);
+      }
+      console.log(`\t🖱️ Click attempt ${attempt}/${maxRetries}`);
+      // Wait for connection to establish
+      await this.sleep(checkDelay);
+      // Check if audio elements are monitored or WebRTC connections exist
+      const status = await this._checkConnectionStatus();
+      if (status.isConnected) {
+        console.log(`\t✅ Connection established (monitored: ${status.monitoredElements}, rtc: ${status.rtcConnections})`);
+        return;
+      }
+      if (attempt < maxRetries) {
+        console.log(`\t⚠️ No connection detected (monitored: ${status.monitoredElements}, rtc: ${status.rtcConnections}), retrying in ${retryDelay}ms...`);
+        await this.sleep(retryDelay);
+      } else {
+        console.log(`\t⚠️ No connection detected after ${maxRetries} attempts, proceeding anyway`);
+      }
+    }
+  }
+  /**
+   * Check if a widget is showing a configuration error (e.g., "unauthenticated web calls" not enabled).
+   * Inspects the shadow DOM for error indicators.
+   */
+  async _getWidgetErrorState(selector) {
+    const parts = selector.split('>>>').map(s => s.trim());
+    const hostSelector = parts[0];
+    return await this.page.evaluate((host) => {
+      const el = document.querySelector(host);
+      if (!el || !el.shadowRoot) return { isConfigError: false };
+      const text = el.shadowRoot.textContent || '';
+      // Check for known configuration error messages
+      const configErrors = [
+        'unauthenticated web calls',
+        'support unauthenticated',
+        'not configured',
+        'configuration required'
+      ];
+      const lowerText = text.toLowerCase();
+      for (const pattern of configErrors) {
+        if (lowerText.includes(pattern)) {
+          // Extract a readable error message
+          const errorText = text.trim().replace(/\s+/g, ' ').substring(0, 200);
+          return { isConfigError: true, errorText };
+        }
+      }
+      return { isConfigError: false };
+    }, hostSelector);
+  }
+  async _checkConnectionStatus() {
+    const status = await this.page.evaluate(() => {
+      const info = { monitoredElements: 0, hasActiveConnection: false };
+      if (window.audioMonitor && window.audioMonitor.monitoredElements) {
+        info.monitoredElements = window.audioMonitor.monitoredElements.size;
+      }
+      document.querySelectorAll('audio').forEach(el => {
+        if (el.srcObject) info.hasActiveConnection = true;
+      });
+      return info;
+    });
+    let rtcConnections = 0;
+    try {
+      const rtpStats = await this.page.evaluate(async () => {
+        if (typeof window.__getRtpStats === 'function') {
+          return await window.__getRtpStats();
+        }
+        return null;
+      });
+      if (rtpStats) rtcConnections = rtpStats.connectionCount || 0;
+    } catch {
+      // Ignore RTP stats errors
+    }
+    return {
+      monitoredElements: status.monitoredElements,
+      rtcConnections,
+      isConnected: status.monitoredElements > 0 || status.hasActiveConnection || rtcConnections > 0
+    };
+  }
   async handleWaitForVoice() {
     if (this.debug) {
       console.log('\t⏳ Waiting for audio to start (AI agent response)...');
     }
+    // Check if audio is already playing before waiting for a new event.
+    // This handles the case where audiostart fired before we started listening
+    // (e.g., during click_with_retry or between steps).
+    const alreadyPlaying = await this.page.evaluate(() => {
+      if (window.audioMonitor && window.audioMonitor.monitoredElements) {
+        for (const [, data] of window.audioMonitor.monitoredElements) {
+          if (data.isPlaying) return true;
+        }
+      }
+      return false;
+    });
+    if (alreadyPlaying) {
+      if (this.debug) {
+        console.log('\t✅ Audio already playing');
+      }
+      return;
+    }
     await this.waitForAudioEvent('audiostart');
     if (this.debug) {
       console.log('\t✅ Audio detected');
@@ -590,6 +759,27 @@ export class VoiceAgentTester {
     if (this.debug) {
       console.log('\t⏳ Waiting for audio to stop (silence)...');
     }
+    // Check if all monitored elements are already silent.
+    // This handles the case where audiostop fired before we started listening.
+    const allSilent = await this.page.evaluate(() => {
+      if (window.audioMonitor && window.audioMonitor.monitoredElements) {
+        if (window.audioMonitor.monitoredElements.size === 0) return false; // no elements yet
+        for (const [, data] of window.audioMonitor.monitoredElements) {
+          if (data.isPlaying) return false;
+        }
+        return true; // all elements exist and are silent
+      }
+      return false;
+    });
+    if (allSilent) {
+      if (this.debug) {
+        console.log('\t✅ Already silent');
+      }
+      return;
+    }
     await this.waitForAudioEvent('audiostop');
     if (this.debug) {
       console.log('\t✅ Silence detected');
@@ -678,10 +868,40 @@ export class VoiceAgentTester {
     // Wait for speech to complete by listening for speechend event
     try {
-      await this.waitForAudioEvent('speechend');
+      // Use a shorter timeout for speechend (15s) since we have safety fallback in browser
+      await this.waitForAudioEvent('speechend', 15000);
     } catch (error) {
-      console.error('Timeout waiting for speech to complete:', error.message);
-      throw error;
+      // speechend timeout is recoverable — the audio likely finished but the event was lost
+      // (e.g., agent started responding and disrupted the audio element)
+      if (this.debug) {
+        // Check the state of the speak audio in the browser
+        const speakState = await this.page.evaluate(() => {
+          const info = {
+            currentSpeakAudio: null,
+            audioContextState: null,
+          };
+          try {
+            if (window.currentSpeakAudio) {
+              info.currentSpeakAudio = {
+                paused: window.currentSpeakAudio.paused,
+                ended: window.currentSpeakAudio.ended,
+                currentTime: window.currentSpeakAudio.currentTime,
+                duration: window.currentSpeakAudio.duration,
+                readyState: window.currentSpeakAudio.readyState,
+              };
+            }
+            if (window.globalAudioContext) {
+              info.audioContextState = window.globalAudioContext.state;
+            }
+          } catch (e) { /* ignore */ }
+          return info;
+        }).catch(() => null);
+        console.warn(`\t⚠️ speechend timeout (recovered) — speak audio state:`, JSON.stringify(speakState));
+      } else {
+        console.warn(`\t⚠️ speechend timeout — continuing (audio likely finished)`);
+      }
+      // Don't throw — treat speechend timeout as recoverable
     }
   }

package/tests/integration.test.js CHANGED Viewed

@@ -44,8 +44,9 @@ describe('Integration Tests', () => {
               this.text = text;
             };
-            // Mock __speak function that will be called by the tester
-            // This needs to be in the page itself since evaluateOnNewDocument runs before navigation
+            // Mock __speak and __waitForMediaStream functions
+            // These override the injected audio hooks since inline scripts run after evaluateOnNewDocument
+            window.__waitForMediaStream = () => Promise.resolve();
             window.__speak = (text) => {
               document.getElementById('speech-output').textContent = text;
               // Signal speech end after a small delay to allow waitForAudioEvent to be set up
@@ -75,7 +76,7 @@ describe('Integration Tests', () => {
     // The scenario should complete without throwing errors
     expect(true).toBe(true);
-  });
+  }, 15000);
   test('should handle scenario with wait step', async () => {
     const testPageContent = `