npm - @telnyx/voice-agent-tester - Versions diffs - 0.4.4 → 0.4.6 - Mend

@telnyx/voice-agent-tester 0.4.4 → 0.4.6

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (9) hide show

package/CHANGELOG.md +19 -0
package/README.md +185 -161
package/applications/elevenlabs.yaml +1 -1
package/javascript/audio_output_hooks.js +152 -15
package/package.json +1 -1
package/src/index.js +31 -12
package/src/report.js +169 -90
package/src/voice-agent-tester.js +8 -4
package/tests/voice-agent-tester.test.js +133 -0

package/CHANGELOG.md CHANGED Viewed

@@ -1,5 +1,24 @@
 # Changelog
+## [0.4.6](https://github.com/team-telnyx/voice-agent-tester/compare/v0.4.5...v0.4.6) (2026-03-18)
+### Bug Fixes
+* silence detection + audio element discovery for ElevenLabs ([#29](https://github.com/team-telnyx/voice-agent-tester/issues/29)) ([789b98b](https://github.com/team-telnyx/voice-agent-tester/commit/789b98b2e91a8f0b9443110067dd17d50eaf2381))
+## [0.4.5](https://github.com/team-telnyx/voice-agent-tester/compare/v0.4.4...v0.4.5) (2026-03-16)
+### Bug Fixes
+* add event-based fallback for audio monitoring (ElevenLabs support) ([#27](https://github.com/team-telnyx/voice-agent-tester/issues/27)) ([6051b5e](https://github.com/team-telnyx/voice-agent-tester/commit/6051b5e949376951f0fb046cffcc5a2a5c250e19))
+* align comparison metrics by scenario step index, not absolute step number ([#23](https://github.com/team-telnyx/voice-agent-tester/issues/23)) ([e4c485b](https://github.com/team-telnyx/voice-agent-tester/commit/e4c485b6eae5e9a6d60f11745b46997a183fc180)), closes [#1](https://github.com/team-telnyx/voice-agent-tester/issues/1) [#2](https://github.com/team-telnyx/voice-agent-tester/issues/2)
+* make ElevenLabs branch-id optional for comparison mode ([#24](https://github.com/team-telnyx/voice-agent-tester/issues/24)) ([3f1735a](https://github.com/team-telnyx/voice-agent-tester/commit/3f1735a6a02e6c1edc4b6e17a6be4087127bded8))
+* single headline number in comparison, per-response in --debug ([#26](https://github.com/team-telnyx/voice-agent-tester/issues/26)) ([a482129](https://github.com/team-telnyx/voice-agent-tester/commit/a482129c1bfe49d28aca7dec8230d30e5b6d8f8a)), closes [#1](https://github.com/team-telnyx/voice-agent-tester/issues/1) [#2](https://github.com/team-telnyx/voice-agent-tester/issues/2)
+### Documentation
+* restructure README with comparison mode front and center ([#25](https://github.com/team-telnyx/voice-agent-tester/issues/25)) ([f15cbcd](https://github.com/team-telnyx/voice-agent-tester/commit/f15cbcd8707cded8081d00b90accf09fd77be169))
 ## [0.4.4](https://github.com/team-telnyx/voice-agent-tester/compare/v0.4.3...v0.4.4) (2026-03-11)
 ### Features

package/README.md CHANGED Viewed

@@ -3,160 +3,119 @@
 [![CI](https://github.com/team-telnyx/voice-agent-tester/actions/workflows/ci.yml/badge.svg)](https://github.com/team-telnyx/voice-agent-tester/actions/workflows/ci.yml)
 [![npm version](https://img.shields.io/npm/v/@telnyx/voice-agent-tester.svg)](https://www.npmjs.com/package/@telnyx/voice-agent-tester)
-A CLI tool for automated benchmarking and testing of voice AI agents. Supports Telnyx, ElevenLabs, Vapi, and Retell.
+Automated benchmarking CLI for voice AI agents. Import your assistant from any provider, run identical test scenarios on both platforms, and get a side-by-side latency comparison.
-## Quick Start
+Supports **Telnyx**, **ElevenLabs**, **Vapi**, and **Retell**.
-Run directly with npx (no installation required):
+## Compare Your Voice Agent Against Telnyx
-```bash
-npx @telnyx/voice-agent-tester@latest -a applications/telnyx.yaml -s scenarios/appointment.yaml --assistant-id <YOUR_ASSISTANT_ID>
-```
+The tool imports your assistant from an external provider into Telnyx, then runs the **same scenario** on both platforms and produces a head-to-head latency report:
-Or install globally:
-```bash
-npm install -g @telnyx/voice-agent-tester
-voice-agent-tester -a applications/telnyx.yaml -s scenarios/appointment.yaml --assistant-id <YOUR_ASSISTANT_ID>
+```
+📈 Latency Comparison (elapsed_time):
+--------------------------------------------------------------------------------
+Metric                                  vapi        Telnyx      Delta            Winner
+--------------------------------------------------------------------------------
+Response #1 (wait_for_voice_elapsed_time) 2849ms    1552ms      -1297ms (-45.5%) 🏆 Telnyx
+Response #2 (wait_for_voice_elapsed_time) 3307ms    704ms       -2603ms (-78.7%) 🏆 Telnyx
+--------------------------------------------------------------------------------
+📊 Overall Summary:
+   Compared 2 matched response latencies
+   vapi total latency: 6156ms
+   Telnyx total latency: 2256ms
+   Difference: -3900ms (-63.3%)
+   🏆 Result: Telnyx is faster overall
 ```
-## CLI Options
-| Option | Default | Description |
-|--------|---------|-------------|
-| `-a, --applications` | required | Application config path(s) or folder |
-| `-s, --scenarios` | required | Scenario config path(s) or folder |
-| `--assistant-id` | | Telnyx or provider assistant ID |
-| `--api-key` | | Telnyx API key for authentication |
-| `--provider` | | Import from provider (`vapi`, `elevenlabs`, `retell`) |
-| `--provider-api-key` | | External provider API key (required with `--provider`) |
-| `--provider-import-id` | | Provider assistant ID to import (required with `--provider`) |
-| `--share-key` | | Vapi share key for comparison mode (prompted if missing) |
-| `--branch-id` | | ElevenLabs branch ID for comparison mode (prompted if missing) |
-| `--compare` | `true` | Run both provider direct and Telnyx import benchmarks |
-| `--no-compare` | | Disable comparison (run only Telnyx import) |
-| `-d, --debug` | `false` | Enable detailed timeout diagnostics |
-| `-v, --verbose` | `false` | Show browser console logs |
-| `--headless` | `true` | Run browser in headless mode |
-| `--repeat` | `1` | Number of repetitions per combination |
-| `-c, --concurrency` | `1` | Number of parallel tests |
-| `-r, --report` | | Generate CSV report to specified file |
-| `-p, --params` | | URL template params (e.g., `key=value,key2=value2`) |
-| `--application-tags` | | Filter applications by comma-separated tags |
-| `--scenario-tags` | | Filter scenarios by comma-separated tags |
-| `--assets-server` | `http://localhost:3333` | Assets server URL |
-| `--audio-url` | | URL to audio file to play as input during entire benchmark |
-| `--audio-volume` | `1.0` | Volume level for audio input (0.0 to 1.0) |
-## Bundled Configs
-| Application Config | Provider |
-|-------------------|----------|
-| `applications/telnyx.yaml` | Telnyx AI Widget |
-| `applications/elevenlabs.yaml` | ElevenLabs |
-| `applications/vapi.yaml` | Vapi |
-| `applications/retell.yaml` | Retell |
-| `applications/livetok.yaml` | Livetok |
-Scenarios:
-- `scenarios/appointment.yaml` - Basic appointment booking test
-- `scenarios/appointment_with_noise.yaml` - Appointment with background noise (pre-mixed audio)
-## Background Noise Testing
-Test voice agents' performance with ambient noise (e.g., crowd chatter, cafe environment). Background noise is pre-mixed into audio files to simulate real-world conditions where users speak to voice agents in noisy environments.
-### Running with Background Noise
+### Vapi vs Telnyx
 ```bash
-# Telnyx with background noise
-npx @telnyx/voice-agent-tester@latest \
-  -a applications/telnyx.yaml \
-  -s scenarios/appointment_with_noise.yaml \
-  --assistant-id <YOUR_ASSISTANT_ID>
-# Compare with no noise (same assistant)
 npx @telnyx/voice-agent-tester@latest \
   -a applications/telnyx.yaml \
   -s scenarios/appointment.yaml \
-  --assistant-id <YOUR_ASSISTANT_ID>
+  --provider vapi \
+  --share-key <VAPI_SHARE_KEY> \
+  --api-key <TELNYX_API_KEY> \
+  --provider-api-key <VAPI_API_KEY> \
+  --provider-import-id <VAPI_ASSISTANT_ID>
+```
+### ElevenLabs vs Telnyx
-# Generate CSV report with metrics
+```bash
 npx @telnyx/voice-agent-tester@latest \
   -a applications/telnyx.yaml \
-  -s scenarios/appointment_with_noise.yaml \
-  --assistant-id <YOUR_ASSISTANT_ID> \
-  -r output/noise_benchmark.csv
+  -s scenarios/appointment.yaml \
+  --provider elevenlabs \
+  --api-key <TELNYX_API_KEY> \
+  --provider-api-key <ELEVENLABS_API_KEY> \
+  --provider-import-id <ELEVENLABS_AGENT_ID>
 ```
-### Custom Audio Input from URL
-Play any audio file from a URL as input throughout the entire benchmark run. The audio is sent to the voice agent as microphone input.
+### Retell vs Telnyx
 ```bash
-# Use custom audio input from URL
 npx @telnyx/voice-agent-tester@latest \
   -a applications/telnyx.yaml \
   -s scenarios/appointment.yaml \
-  --assistant-id <YOUR_ASSISTANT_ID> \
-  --audio-url "https://example.com/test-audio.mp3" \
-  --audio-volume 0.8
+  --provider retell \
+  --api-key <TELNYX_API_KEY> \
+  --provider-api-key <RETELL_API_KEY> \
+  --provider-import-id <RETELL_AGENT_ID>
 ```
-This is useful for:
-- Testing with custom audio inputs
-- Using longer audio tracks that play throughout the benchmark
-- A/B testing different audio sources
+### How Comparison Works
-### Bundled Audio Files
+1. **Import** — The assistant is imported from the external provider into Telnyx
+2. **Phase 1: Provider Direct** — Runs the scenario on the provider's native widget
+3. **Phase 2: Telnyx Import** — Runs the same scenario on the Telnyx-imported assistant
+4. **Report** — Produces a side-by-side comparison with latency delta and winner per response
-| File | Description |
-|------|-------------|
-| `hello_make_an_appointment.mp3` | Clean appointment request |
-| `hello_make_an_appointment_with_noise.mp3` | Appointment request with crowd noise |
-| `appointment_data.mp3` | Clean appointment details |
-| `appointment_data_with_noise.mp3` | Appointment details with crowd noise |
+### Provider-Specific Keys
-### Scenario Configuration
+Some providers need an extra key to load their demo widget. If not passed via CLI, the tool prompts with instructions.
-The noise scenario uses pre-mixed audio files:
+| Provider | Flag | Required? | How to find it |
+|----------|------|-----------|----------------|
+| Vapi | `--share-key` | Yes | Dashboard → select assistant → click 🔗 link icon next to the assistant ID |
+| ElevenLabs | `--branch-id` | No | Dashboard → Agents → select agent → Publish dropdown → "Copy shareable link" |
-```yaml
-# scenarios/appointment_with_noise.yaml
-tags:
-  - default
-  - noise
-steps:
-  - action: wait_for_voice
-  - action: wait_for_silence
-  - action: sleep
-    time: 1000
-  - action: speak
-    file: hello_make_an_appointment_with_noise.mp3
-  - action: wait_for_voice
-    metrics: elapsed_time
-  - action: wait_for_silence
-  - action: speak
-    file: appointment_data_with_noise.mp3
-  - action: wait_for_voice
-    metrics: elapsed_time
+### Import Only (Skip Comparison)
+To import without running the provider benchmark:
+```bash
+npx @telnyx/voice-agent-tester@latest \
+  -a applications/telnyx.yaml \
+  -s scenarios/appointment.yaml \
+  --provider vapi \
+  --no-compare \
+  --api-key <TELNYX_API_KEY> \
+  --provider-api-key <VAPI_API_KEY> \
+  --provider-import-id <VAPI_ASSISTANT_ID>
 ```
-### Metrics and Reports
+## Quick Start
-The benchmark collects response latency metrics at each `wait_for_voice` step with `metrics: elapsed_time`. Generated CSV reports include:
+Run directly with npx (no installation required):
-```csv
-app, scenario, repetition, success, duration, step_9_wait_for_voice_elapsed_time, step_12_wait_for_voice_elapsed_time
-telnyx, appointment_with_noise, 0, 1, 29654, 1631, 1225
+```bash
+npx @telnyx/voice-agent-tester@latest \
+  -a applications/telnyx.yaml \
+  -s scenarios/appointment.yaml \
+  --assistant-id <YOUR_ASSISTANT_ID>
 ```
-Compare results with and without noise to measure how background noise affects your voice agent's:
-- Response latency
-- Speech recognition accuracy
-- Overall conversation flow
+Or install globally:
-## Examples
+```bash
+npm install -g @telnyx/voice-agent-tester
+voice-agent-tester -a applications/telnyx.yaml -s scenarios/appointment.yaml --assistant-id <YOUR_ASSISTANT_ID>
+```
+## Provider Examples
 ### Telnyx
@@ -185,78 +144,143 @@ npx @telnyx/voice-agent-tester@latest \
   --assistant-id <ASSISTANT_ID>
 ```
-## Comparison Mode
+## CLI Reference
-When importing from an external provider, the tool automatically runs both benchmarks in sequence and generates a comparison report:
+| Option | Default | Description |
+|--------|---------|-------------|
+| `-a, --applications` | required | Application config path(s) or folder |
+| `-s, --scenarios` | required | Scenario config path(s) or folder |
+| `--assistant-id` | | Telnyx or provider assistant ID |
+| `--api-key` | | Telnyx API key |
+| `--provider` | | Import from provider (`vapi`, `elevenlabs`, `retell`) |
+| `--provider-api-key` | | External provider API key |
+| `--provider-import-id` | | Provider assistant/agent ID to import |
+| `--share-key` | | Vapi share key for comparison mode |
+| `--branch-id` | | ElevenLabs branch ID (optional) |
+| `--compare` | `true` | Run provider direct + Telnyx import benchmarks |
+| `--no-compare` | | Skip provider direct benchmark |
+| `-d, --debug` | `false` | Detailed timeout diagnostics |
+| `-v, --verbose` | `false` | Show browser console logs |
+| `--headless` | `true` | Run browser in headless mode |
+| `--repeat` | `1` | Repetitions per app+scenario combination |
+| `-c, --concurrency` | `1` | Parallel test runs |
+| `-r, --report` | | CSV report output path |
+| `-p, --params` | | URL template params (`key=value,key2=value2`) |
+| `--retries` | `0` | Retry failed runs |
+| `--application-tags` | | Filter applications by tags |
+| `--scenario-tags` | | Filter scenarios by tags |
+| `--record` | `false` | Record video+audio (webm) |
+| `--audio-url` | | URL to audio file played as input during run |
+| `--audio-volume` | `1.0` | Audio input volume (0.0–1.0) |
+| `--assets-server` | `http://localhost:3333` | Assets server URL |
-1. **Provider Direct** - Benchmarks the assistant on the original provider's widget
-2. **Telnyx Import** - Benchmarks the same assistant after importing to Telnyx
+## Bundled Configs
-### Provider-Specific Keys
+**Applications:**
-Comparison mode requires a provider-specific key to load the provider's direct widget. If not passed via CLI, the tool will prompt you with instructions on how to find it.
+| Config | Provider |
+|--------|----------|
+| `applications/telnyx.yaml` | Telnyx AI Widget |
+| `applications/elevenlabs.yaml` | ElevenLabs |
+| `applications/vapi.yaml` | Vapi |
+| `applications/retell.yaml` | Retell |
+**Scenarios:**
-| Provider | Flag | How to find it |
-|----------|------|----------------|
-| Vapi | `--share-key` | In the Vapi Dashboard, select your assistant, then click the link icon (🔗) next to the assistant ID at the top. This copies the demo link containing your share key. |
-| ElevenLabs | `--branch-id` | In the ElevenLabs Dashboard, go to Agents, select your target agent, then click the dropdown next to Publish and select "Copy shareable link". This copies the demo link containing your branch ID. |
+| Config | Description |
+|--------|-------------|
+| `scenarios/appointment.yaml` | Appointment booking test |
+| `scenarios/appointment_with_noise.yaml` | Appointment with background crowd noise |
-### Import and Compare (Default)
+## Background Noise Testing
-**Vapi:**
+Test how voice agents perform with ambient noise by using pre-mixed audio files:
 ```bash
+# With background noise
+npx @telnyx/voice-agent-tester@latest \
+  -a applications/telnyx.yaml \
+  -s scenarios/appointment_with_noise.yaml \
+  --assistant-id <ASSISTANT_ID>
+# Without noise (same assistant, compare results)
 npx @telnyx/voice-agent-tester@latest \
   -a applications/telnyx.yaml \
   -s scenarios/appointment.yaml \
-  --provider vapi \
-  --share-key <VAPI_SHARE_KEY> \
-  --api-key <TELNYX_KEY> \
-  --provider-api-key <VAPI_KEY> \
-  --provider-import-id <VAPI_ASSISTANT_ID>
+  --assistant-id <ASSISTANT_ID>
 ```
-**ElevenLabs:**
+### Custom Audio Input
+Play any audio file from a URL as microphone input throughout the benchmark:
 ```bash
 npx @telnyx/voice-agent-tester@latest \
   -a applications/telnyx.yaml \
   -s scenarios/appointment.yaml \
-  --provider elevenlabs \
-  --branch-id <ELEVENLABS_BRANCH_ID> \
-  --api-key <TELNYX_KEY> \
-  --provider-api-key <ELEVENLABS_KEY> \
-  --provider-import-id <ELEVENLABS_AGENT_ID>
+  --assistant-id <ASSISTANT_ID> \
+  --audio-url "https://example.com/test-audio.mp3" \
+  --audio-volume 0.8
 ```
-This will:
-- Run Phase 1: Provider direct benchmark
-- Run Phase 2: Telnyx import benchmark
-- Generate a side-by-side latency comparison report
+### Audio Assets
-### Import Only (No Comparison)
+| File | Description |
+|------|-------------|
+| `hello_make_an_appointment.mp3` | Clean appointment request |
+| `hello_make_an_appointment_with_noise.mp3` | Appointment request + crowd noise |
+| `appointment_data.mp3` | Clean appointment details |
+| `appointment_data_with_noise.mp3` | Appointment details + crowd noise |
-To skip the provider direct benchmark and only run the Telnyx import:
+## Scenario Configuration
-```bash
-npx @telnyx/voice-agent-tester@latest \
-  -a applications/telnyx.yaml \
-  -s scenarios/appointment.yaml \
-  --provider vapi \
-  --no-compare \
-  --api-key <TELNYX_KEY> \
-  --provider-api-key <VAPI_KEY> \
-  --provider-import-id <VAPI_ASSISTANT_ID>
+Scenarios are YAML files with a sequence of steps. Steps with `metrics: elapsed_time` are included in the latency report.
+```yaml
+# scenarios/appointment.yaml
+steps:
+  - action: wait_for_voice        # Wait for agent greeting
+  - action: wait_for_silence      # Wait for greeting to finish
+  - action: speak
+    file: hello_make_an_appointment.mp3
+  - action: wait_for_voice        # ← Measured: time to first response
+    metrics: elapsed_time
+  - action: wait_for_silence
+  - action: speak
+    file: appointment_data.mp3
+  - action: wait_for_voice        # ← Measured: time to second response
+    metrics: elapsed_time
 ```
-### Debugging Failures
+### Available Actions
+| Action | Description |
+|--------|-------------|
+| `speak` | Play audio (`file`) or synthesize text (`text`) as microphone input |
+| `wait_for_voice` | Wait for the AI agent to start speaking |
+| `wait_for_silence` | Wait for the AI agent to stop speaking |
+| `sleep` | Pause for a fixed duration (`time` in ms) |
+| `click` | Click an element (`selector`) |
+| `click_with_retry` | Click with retries and connection verification |
+| `wait_for_element` | Wait for a DOM element to appear |
+| `type` | Type text into an input field |
+| `fill` | Set an input field value directly |
+| `select` | Select dropdown/checkbox/radio option |
+| `screenshot` | Capture a screenshot |
+| `listen` | Record agent audio, transcribe, and evaluate |
-If benchmarks fail, rerun with `--debug` for detailed diagnostics:
+## Debugging
+If benchmarks fail or time out, use `--debug` for detailed diagnostics including audio monitor state, WebRTC connection info, and RTP stats:
 ```bash
-voice-agent-tester --provider vapi --debug [other options...]
+npx @telnyx/voice-agent-tester@latest \
+  -a applications/telnyx.yaml \
+  -s scenarios/appointment.yaml \
+  --assistant-id <ASSISTANT_ID> \
+  --debug
 ```
 ## License
-MIT
+MIT

package/applications/elevenlabs.yaml CHANGED Viewed

@@ -1,4 +1,4 @@
-url: "https://elevenlabs.io/app/talk-to?agent_id={{assistantId}}&branch_id={{branchId}}"
+url: "https://elevenlabs.io/app/talk-to?agent_id={{assistantId}}"
 tags:
   - provider
   - elevenlabs

package/javascript/audio_output_hooks.js CHANGED Viewed

@@ -118,6 +118,7 @@ class AudioElementMonitor {
     this.scanExistingAudioElements();
     this.setupProgrammaticAudioInterception();
     this.setupShadowDomInterception();
+    this.startPeriodicScan();
     console.log("AudioElementMonitor initialized");
   }
@@ -324,6 +325,42 @@ class AudioElementMonitor {
     });
   }
+  /**
+   * Periodic scan for unmonitored audio elements.
+   * Catches elements that bypass interceptors (e.g., created in bundled code
+   * that captured native constructors, or appended to shadow DOMs not observed).
+   */
+  startPeriodicScan() {
+    setInterval(() => {
+      // Scan all audio elements in the main document
+      const allAudio = document.querySelectorAll('audio');
+      allAudio.forEach(audioEl => {
+        const elementId = this.getElementId(audioEl);
+        if (!this.monitoredElements.has(elementId) && (audioEl.srcObject || audioEl.src)) {
+          console.log(`Periodic scan found unmonitored audio element: ${elementId}`);
+          if (audioEl.srcObject && audioEl.srcObject instanceof MediaStream) {
+            this.monitorAudioElement(audioEl, elementId);
+          } else if (audioEl.src) {
+            this.monitorProgrammaticAudioElement(audioEl, elementId);
+          }
+        }
+      });
+      // Also scan programmatic Audio instances that were intercepted but never monitored
+      programmaticAudioInstances.forEach(audioEl => {
+        const elementId = this.getElementId(audioEl);
+        if (!this.monitoredElements.has(elementId) && (audioEl.srcObject || audioEl.src)) {
+          console.log(`Periodic scan found unmonitored programmatic audio: ${elementId}`);
+          if (audioEl.srcObject && audioEl.srcObject instanceof MediaStream) {
+            this.monitorAudioElement(audioEl, elementId);
+          } else if (audioEl.src) {
+            this.monitorProgrammaticAudioElement(audioEl, elementId);
+          }
+        }
+      });
+    }, 2000);
+  }
   handleAudioElement(audioElement) {
     const elementId = this.getElementId(audioElement);
@@ -526,10 +563,98 @@ class AudioElementMonitor {
       console.log(`Started monitoring programmatic audio element: ${elementId}`);
     } catch (error) {
-      console.error(`Failed to monitor programmatic audio element ${elementId}:`, error);
+      console.error(`Failed to monitor programmatic audio element ${elementId} via analyser:`, error.message);
+      console.log(`Falling back to event-based monitoring for ${elementId}`);
+      this.monitorViaEvents(audioElement, elementId);
     }
   }
+  /**
+   * Fallback monitoring using audio element events (timeupdate/playing/pause).
+   * Used when AudioContext-based monitoring fails (e.g., when the audio element
+   * is already connected to another AudioContext via MediaStreamDestination).
+   */
+  monitorViaEvents(audioElement, elementId) {
+    const monitorData = {
+      element: audioElement,
+      source: null,
+      analyser: null,
+      dataArray: null,
+      isPlaying: false,
+      lastAudioTime: 0,
+      silenceThreshold: 10,
+      checkInterval: null,
+      isProgrammatic: true,
+      eventBased: true
+    };
+    this.monitoredElements.set(elementId, monitorData);
+    // Use timeupdate to detect audio activity — fires ~4x/sec during playback
+    let lastTimeUpdate = 0;
+    let silenceTimeoutId = null;
+    const SILENCE_DELAY = 1500; // ms of no timeupdate before declaring silence
+    const resetSilenceTimer = () => {
+      if (silenceTimeoutId) clearTimeout(silenceTimeoutId);
+      silenceTimeoutId = setTimeout(() => {
+        if (monitorData.isPlaying) {
+          monitorData.isPlaying = false;
+          this.dispatchAudioEvent('audiostop', elementId, audioElement);
+          if (typeof window.__publishEvent === 'function') {
+            window.__publishEvent('audiostop', { elementId, timestamp: Date.now() });
+          }
+          console.log(`Audio stopped (event-based): ${elementId}`);
+        }
+      }, SILENCE_DELAY);
+    };
+    audioElement.addEventListener('timeupdate', () => {
+      const now = Date.now();
+      // timeupdate fires even when seeking; only count if currentTime advances
+      if (audioElement.currentTime > 0 && now - lastTimeUpdate > 50) {
+        lastTimeUpdate = now;
+        monitorData.lastAudioTime = now;
+        if (!monitorData.isPlaying) {
+          monitorData.isPlaying = true;
+          this.dispatchAudioEvent('audiostart', elementId, audioElement);
+          if (typeof window.__publishEvent === 'function') {
+            window.__publishEvent('audiostart', { elementId, timestamp: Date.now() });
+          }
+          console.log(`Audio started (event-based): ${elementId}`);
+        }
+        resetSilenceTimer();
+      }
+    });
+    audioElement.addEventListener('pause', () => {
+      if (monitorData.isPlaying) {
+        monitorData.isPlaying = false;
+        if (silenceTimeoutId) clearTimeout(silenceTimeoutId);
+        this.dispatchAudioEvent('audiostop', elementId, audioElement);
+        if (typeof window.__publishEvent === 'function') {
+          window.__publishEvent('audiostop', { elementId, timestamp: Date.now() });
+        }
+        console.log(`Audio stopped (event-based, pause): ${elementId}`);
+      }
+    });
+    audioElement.addEventListener('ended', () => {
+      if (monitorData.isPlaying) {
+        monitorData.isPlaying = false;
+        if (silenceTimeoutId) clearTimeout(silenceTimeoutId);
+        this.dispatchAudioEvent('audiostop', elementId, audioElement);
+        if (typeof window.__publishEvent === 'function') {
+          window.__publishEvent('audiostop', { elementId, timestamp: Date.now() });
+        }
+        console.log(`Audio stopped (event-based, ended): ${elementId}`);
+      }
+    });
+    console.log(`Started event-based monitoring for programmatic audio element: ${elementId}`);
+  }
   monitorAudioElement(audioElement, elementId) {
     if (!this.audioContext) {
       console.warn("AudioContext not available, cannot monitor audio");
@@ -564,7 +689,9 @@ class AudioElementMonitor {
       console.log(`Started monitoring audio element: ${elementId}`);
     } catch (error) {
-      console.error(`Failed to monitor audio element ${elementId}:`, error);
+      console.error(`Failed to monitor audio element ${elementId} via analyser:`, error.message);
+      console.log(`Falling back to event-based monitoring for ${elementId}`);
+      this.monitorViaEvents(audioElement, elementId);
     }
   }
@@ -572,10 +699,18 @@ class AudioElementMonitor {
     const { analyser, dataArray, silenceThreshold } = monitorData;
     monitorData.checkInterval = setInterval(() => {
-      analyser.getByteFrequencyData(dataArray);
-      const average = dataArray.reduce((sum, value) => sum + value, 0) / dataArray.length;
-      const hasAudio = average > silenceThreshold;
+      // Use time-domain data for silence detection — more robust than frequency data.
+      // Time-domain bytes center at 128 for silence; we measure RMS deviation.
+      // This avoids false positives from FFT noise floor / RTP comfort noise.
+      analyser.getByteTimeDomainData(dataArray);
+      let sumSquares = 0;
+      for (let i = 0; i < dataArray.length; i++) {
+        const deviation = dataArray[i] - 128;
+        sumSquares += deviation * deviation;
+      }
+      const rms = Math.sqrt(sumSquares / dataArray.length);
+      const hasAudio = rms > silenceThreshold;
       // if (i++ % 10 == 0) {
       //   console.log(`Average: ${average} hasAudio: ${hasAudio} elementId: ${elementId}`);
@@ -710,13 +845,16 @@ window.__getAudioDiagnostics = function() {
   audioMonitor.monitoredElements.forEach((monitorData, elementId) => {
     const { analyser, dataArray, silenceThreshold, isPlaying, lastAudioTime, isProgrammatic } = monitorData;
-    // Get current audio level if analyser is available
-    let currentLevel = null;
-    let currentMaxLevel = null;
+    // Get current audio level if analyser is available (RMS of time-domain deviation)
+    let currentRms = null;
     if (analyser && dataArray) {
-      analyser.getByteFrequencyData(dataArray);
-      currentLevel = dataArray.reduce((sum, value) => sum + value, 0) / dataArray.length;
-      currentMaxLevel = Math.max(...dataArray);
+      analyser.getByteTimeDomainData(dataArray);
+      let sumSquares = 0;
+      for (let j = 0; j < dataArray.length; j++) {
+        const deviation = dataArray[j] - 128;
+        sumSquares += deviation * deviation;
+      }
+      currentRms = Math.sqrt(sumSquares / dataArray.length);
     }
     diagnostics.elements.push({
@@ -724,9 +862,8 @@ window.__getAudioDiagnostics = function() {
       isPlaying,
       isProgrammatic: !!isProgrammatic,
       silenceThreshold,
-      currentAudioLevel: currentLevel !== null ? currentLevel.toFixed(2) : 'unavailable',
-      currentMaxLevel: currentMaxLevel !== null ? currentMaxLevel : 'unavailable',
-      wouldTriggerAudioStart: currentLevel !== null ? currentLevel > silenceThreshold : 'unknown',
+      currentAudioLevel: currentRms !== null ? currentRms.toFixed(2) : 'unavailable',
+      wouldTriggerAudioStart: currentRms !== null ? currentRms > silenceThreshold : 'unknown',
       lastAudioTime,
       timeSinceLastAudio: lastAudioTime ? Date.now() - lastAudioTime : null
     });

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "@telnyx/voice-agent-tester",
-  "version": "0.4.4",
+  "version": "0.4.6",
   "description": "A command-line tool to test voice agents using Puppeteer",
   "main": "src/index.js",
   "type": "module",

package/src/index.js CHANGED Viewed

@@ -109,14 +109,7 @@ function getCompareRequiredParams(argv) {
       }
       break;
     case 'elevenlabs':
-      if (!argv.branchId) {
-        missing.push({
-          key: 'branchId',
-          flag: '--branch-id',
-          description: 'ElevenLabs branch ID',
-          hint: 'In the ElevenLabs Dashboard, go to Agents, select your target agent, then click the dropdown next to Publish and select "Copy shareable link". This copies the demo link containing your branch ID.'
-        });
-      }
+      // branchId is optional — the talk-to URL works with just agent_id
       break;
     // retell and others: no extra params needed yet
   }
@@ -134,8 +127,22 @@ function getCompareTemplateParams(argv) {
   switch (argv.provider) {
     case 'vapi':
       return { shareKey: argv.shareKey };
+    default:
+      return {};
+  }
+}
+/**
+ * Get provider-specific extra query parameters to append to the comparison URL.
+ * Unlike template params, these are appended as-is (not substituted into {{...}} placeholders).
+ *
+ * @param {Object} argv - Parsed CLI arguments
+ * @returns {Object} Key-value pairs to append as query parameters
+ */
+function getCompareExtraQueryParams(argv) {
+  switch (argv.provider) {
     case 'elevenlabs':
-      return { branchId: argv.branchId };
+      return argv.branchId ? { branch_id: argv.branchId } : {};
     default:
       return {};
   }
@@ -295,7 +302,7 @@ const argv = yargs(hideBin(process.argv))
   })
   .option('branch-id', {
     type: 'string',
-    description: 'ElevenLabs branch ID for direct widget testing (required for comparison mode with --provider elevenlabs)'
+    description: 'ElevenLabs branch ID for direct widget testing (optional, appended to demo URL when provided)'
   })
   .option('assistant-id', {
     type: 'string',
@@ -710,7 +717,19 @@ async function main() {
         throw new Error(`Provider application config not found: ${providerAppPath}\nPlease create applications/${argv.provider}.yaml for direct provider benchmarking.`);
       }
-      const providerApplications = [loadApplicationConfig(providerAppPath, providerParams)];
+      const providerApp = loadApplicationConfig(providerAppPath, providerParams);
+      // Append optional extra query parameters (e.g. branch_id for ElevenLabs)
+      const extraQueryParams = getCompareExtraQueryParams(argv);
+      if (providerApp.url && Object.keys(extraQueryParams).length > 0) {
+        const url = new URL(providerApp.url);
+        for (const [key, value] of Object.entries(extraQueryParams)) {
+          url.searchParams.set(key, value);
+        }
+        providerApp.url = url.toString();
+      }
+      const providerApplications = [providerApp];
       const providerResults = await runBenchmark({
         applications: providerApplications,
@@ -762,7 +781,7 @@ async function main() {
       telnyxReportGenerator.generateMetricsSummary();
       // Generate comparison report
-      ReportGenerator.generateComparisonSummary(providerReportGenerator, telnyxReportGenerator, argv.provider);
+      ReportGenerator.generateComparisonSummary(providerReportGenerator, telnyxReportGenerator, argv.provider, { debug: argv.debug });
       // Generate CSV reports if requested
       if (argv.report) {

package/src/report.js CHANGED Viewed

@@ -28,7 +28,7 @@ export class ReportGenerator {
     });
   }
-  recordStepMetric(appName, scenarioName, repetition, stepIndex, action, name, value) {
+  recordStepMetric(appName, scenarioName, repetition, stepIndex, action, name, value, scenarioStepIndex = null) {
     const key = this._getRunKey(appName, scenarioName, repetition);
     const run = this.runs.get(key);
@@ -52,6 +52,26 @@ export class ReportGenerator {
     if (!this.stepColumns.get(stepIndex).has(name)) {
       this.stepColumns.get(stepIndex).set(name, `step_${stepIndex + 1}_${action}_${name}`);
     }
+    // Track scenario step index for cross-provider comparison alignment
+    if (scenarioStepIndex !== null) {
+      if (!this.scenarioStepMap) {
+        this.scenarioStepMap = new Map();
+      }
+      // Map absolute stepIndex -> scenarioStepIndex (1-based)
+      this.scenarioStepMap.set(stepIndex, scenarioStepIndex);
+      // Track scenario-based column names for comparison display
+      if (!this.scenarioStepColumns) {
+        this.scenarioStepColumns = new Map();
+      }
+      if (!this.scenarioStepColumns.has(scenarioStepIndex)) {
+        this.scenarioStepColumns.set(scenarioStepIndex, new Map());
+      }
+      if (!this.scenarioStepColumns.get(scenarioStepIndex).has(name)) {
+        this.scenarioStepColumns.get(scenarioStepIndex).set(name, `scenario_step_${scenarioStepIndex}_${action}_${name}`);
+      }
+    }
   }
   endRun(appName, scenarioName, repetition, success = true) {
@@ -285,119 +305,178 @@ export class ReportGenerator {
     return result;
   }
+  /**
+   * Get aggregated metrics keyed by scenario step index for cross-provider comparison.
+   * Returns a Map of scenarioStepIndex -> { metricName -> { avg, min, max, p50, columnName } }
+   */
+  getAggregatedMetricsByScenarioStep() {
+    const result = new Map();
+    // Build reverse map: absolute stepIndex -> scenarioStepIndex
+    const scenarioStepMap = this.scenarioStepMap || new Map();
+    // Collect values grouped by scenarioStepIndex
+    const grouped = new Map(); // scenarioStepIndex -> metricName -> values[]
+    this.allRunsData.forEach(run => {
+      run.stepMetrics.forEach((metrics, stepIndex) => {
+        const scenarioIdx = scenarioStepMap.get(stepIndex);
+        if (scenarioIdx == null) return; // skip steps without scenario mapping
+        if (!grouped.has(scenarioIdx)) {
+          grouped.set(scenarioIdx, new Map());
+        }
+        metrics.forEach((value, metricName) => {
+          if (!grouped.get(scenarioIdx).has(metricName)) {
+            grouped.get(scenarioIdx).set(metricName, []);
+          }
+          grouped.get(scenarioIdx).get(metricName).push(value);
+        });
+      });
+    });
+    grouped.forEach((metricMap, scenarioIdx) => {
+      const stepResult = new Map();
+      metricMap.forEach((values, metricName) => {
+        if (values.length > 0) {
+          const sum = values.reduce((a, b) => a + b, 0);
+          const avg = sum / values.length;
+          const min = Math.min(...values);
+          const max = Math.max(...values);
+          const sortedValues = [...values].sort((a, b) => a - b);
+          let p50;
+          if (sortedValues.length % 2 === 0) {
+            p50 = (sortedValues[sortedValues.length / 2 - 1] + sortedValues[sortedValues.length / 2]) / 2;
+          } else {
+            p50 = sortedValues[Math.floor(sortedValues.length / 2)];
+          }
+          const columnName = this.scenarioStepColumns?.get(scenarioIdx)?.get(metricName) ||
+                            `scenario_step_${scenarioIdx}_${metricName}`;
+          stepResult.set(metricName, { avg, min, max, p50, columnName });
+        }
+      });
+      result.set(scenarioIdx, stepResult);
+    });
+    return result;
+  }
   /**
    * Generate a comparison summary between two providers.
+   * Aligns metrics by scenario step index so that identical scenario steps
+   * are compared regardless of different application setup steps.
    * @param {ReportGenerator} providerReport - Report from the provider benchmark
    * @param {ReportGenerator} telnyxReport - Report from the Telnyx benchmark
    * @param {string} providerName - Name of the external provider
    */
-  static generateComparisonSummary(providerReport, telnyxReport, providerName) {
+  static generateComparisonSummary(providerReport, telnyxReport, providerName, { debug = false } = {}) {
     console.log('\n' + '='.repeat(80));
-    console.log('📊 COMPARISON SUMMARY: ' + providerName.toUpperCase() + ' vs TELNYX');
+    console.log('📊 COMPARISON: ' + providerName.toUpperCase() + ' vs TELNYX');
     console.log('='.repeat(80));
-    const providerMetrics = providerReport.getAggregatedMetrics();
-    const telnyxMetrics = telnyxReport.getAggregatedMetrics();
+    // Use scenario-step-aligned metrics for comparison
+    const providerMetrics = providerReport.getAggregatedMetricsByScenarioStep();
+    const telnyxMetrics = telnyxReport.getAggregatedMetricsByScenarioStep();
-    // Find all unique step indices from both reports
-    const allStepIndices = new Set([
+    // Find matched scenario steps (present in both providers)
+    const allScenarioSteps = new Set([
       ...providerMetrics.keys(),
       ...telnyxMetrics.keys()
     ]);
-    const sortedIndices = Array.from(allStepIndices).sort((a, b) => a - b);
+    const sortedIndices = Array.from(allScenarioSteps).sort((a, b) => a - b);
+    // Collect matched latencies
+    const providerLatencies = [];
+    const telnyxLatencies = [];
+    const perResponse = []; // for debug output
-    if (sortedIndices.length === 0) {
-      console.log('No metrics available for comparison.');
+    sortedIndices.forEach(scenarioStep => {
+      const providerElapsed = providerMetrics.get(scenarioStep)?.get('elapsed_time');
+      const telnyxElapsed = telnyxMetrics.get(scenarioStep)?.get('elapsed_time');
+      if (providerElapsed && telnyxElapsed) {
+        providerLatencies.push(providerElapsed.avg);
+        telnyxLatencies.push(telnyxElapsed.avg);
+        perResponse.push({
+          providerAvg: providerElapsed.avg,
+          telnyxAvg: telnyxElapsed.avg,
+          columnName: providerElapsed.columnName || telnyxElapsed.columnName
+        });
+      }
+    });
+    if (providerLatencies.length === 0) {
+      console.log('\n⚠️  No comparable metrics found between providers.');
+      console.log('='.repeat(80));
       return;
     }
-    // Compare elapsed_time metrics (primary latency indicator)
-    console.log('\n📈 Latency Comparison (elapsed_time):');
-    console.log('-'.repeat(80));
-    console.log(
-      'Step'.padEnd(40) +
-      providerName.padEnd(12) +
-      'Telnyx'.padEnd(12) +
-      'Delta'.padEnd(12) +
-      'Winner'
-    );
-    console.log('-'.repeat(80));
-    sortedIndices.forEach(stepIndex => {
-      const providerStep = providerMetrics.get(stepIndex);
-      const telnyxStep = telnyxMetrics.get(stepIndex);
-      const providerElapsed = providerStep?.get('elapsed_time');
-      const telnyxElapsed = telnyxStep?.get('elapsed_time');
-      if (providerElapsed || telnyxElapsed) {
-        const columnName = providerElapsed?.columnName || telnyxElapsed?.columnName || `step_${stepIndex + 1}`;
-        const shortName = columnName.length > 38 ? columnName.substring(0, 35) + '...' : columnName;
-        const providerAvg = providerElapsed ? Math.round(providerElapsed.avg) : '-';
-        const telnyxAvg = telnyxElapsed ? Math.round(telnyxElapsed.avg) : '-';
-        let delta = '-';
-        let winner = '-';
-        if (providerElapsed && telnyxElapsed) {
-          const diff = telnyxElapsed.avg - providerElapsed.avg;
-          const pct = ((diff / providerElapsed.avg) * 100).toFixed(1);
-          delta = diff > 0 ? `+${Math.round(diff)}ms` : `${Math.round(diff)}ms`;
-          if (Math.abs(diff) < 50) {
-            winner = '≈ Tie';
-          } else if (diff < 0) {
-            winner = '🏆 Telnyx';
-          } else {
-            winner = `🏆 ${providerName}`;
-          }
-          delta += ` (${pct}%)`;
+    // Debug: show per-response breakdown
+    if (debug && perResponse.length > 0) {
+      console.log('\n📈 Per-response breakdown:');
+      console.log('-'.repeat(80));
+      console.log(
+        'Response'.padEnd(40) +
+        providerName.padEnd(12) +
+        'Telnyx'.padEnd(12) +
+        'Delta'.padEnd(16) +
+        'Winner'
+      );
+      console.log('-'.repeat(80));
+      perResponse.forEach((r, i) => {
+        const action = (r.columnName || '').replace(/^scenario_step_\d+_/, '');
+        const label = `#${i + 1} (${action})`;
+        const shortLabel = label.length > 38 ? label.substring(0, 35) + '...' : label;
+        const diff = r.telnyxAvg - r.providerAvg;
+        const pct = ((diff / r.providerAvg) * 100).toFixed(1);
+        const delta = `${diff > 0 ? '+' : ''}${Math.round(diff)}ms (${pct}%)`;
+        let winner;
+        if (Math.abs(diff) < 50) {
+          winner = '≈ Tie';
+        } else if (diff < 0) {
+          winner = '🏆 Telnyx';
+        } else {
+          winner = `🏆 ${providerName}`;
         }
         console.log(
-          shortName.padEnd(40) +
-          `${providerAvg}ms`.padEnd(12) +
-          `${telnyxAvg}ms`.padEnd(12) +
-          delta.padEnd(12) +
+          shortLabel.padEnd(40) +
+          `${Math.round(r.providerAvg)}ms`.padEnd(12) +
+          `${Math.round(r.telnyxAvg)}ms`.padEnd(12) +
+          delta.padEnd(16) +
           winner
         );
-      }
-    });
-    console.log('-'.repeat(80));
+      });
-    // Summary
-    let providerTotal = 0, telnyxTotal = 0, comparableSteps = 0;
-    sortedIndices.forEach(stepIndex => {
-      const providerStep = providerMetrics.get(stepIndex);
-      const telnyxStep = telnyxMetrics.get(stepIndex);
-      const providerElapsed = providerStep?.get('elapsed_time');
-      const telnyxElapsed = telnyxStep?.get('elapsed_time');
-      if (providerElapsed && telnyxElapsed) {
-        providerTotal += providerElapsed.avg;
-        telnyxTotal += telnyxElapsed.avg;
-        comparableSteps++;
-      }
-    });
+      console.log('-'.repeat(80));
+    }
-    if (comparableSteps > 0) {
-      const totalDiff = telnyxTotal - providerTotal;
-      const totalPct = ((totalDiff / providerTotal) * 100).toFixed(1);
-      console.log('\n📊 Overall Summary:');
-      console.log(`   ${providerName} total latency: ${Math.round(providerTotal)}ms`);
-      console.log(`   Telnyx total latency: ${Math.round(telnyxTotal)}ms`);
-      console.log(`   Difference: ${totalDiff > 0 ? '+' : ''}${Math.round(totalDiff)}ms (${totalPct}%)`);
-      if (Math.abs(totalDiff) < 100) {
-        console.log('\n   🤝 Result: Both providers perform similarly');
-      } else if (totalDiff < 0) {
-        console.log('\n   🏆 Result: Telnyx is faster overall');
-      } else {
-        console.log(`\n   🏆 Result: ${providerName} is faster overall`);
-      }
+    // One headline number: average response latency
+    const providerAvg = providerLatencies.reduce((a, b) => a + b, 0) / providerLatencies.length;
+    const telnyxAvg = telnyxLatencies.reduce((a, b) => a + b, 0) / telnyxLatencies.length;
+    const diff = telnyxAvg - providerAvg;
+    const pct = ((diff / providerAvg) * 100).toFixed(1);
+    console.log(`\n   Average response latency (${providerLatencies.length} matched responses):\n`);
+    console.log(`   ${providerName.padEnd(16)} ${Math.round(providerAvg)}ms`);
+    console.log(`   ${'Telnyx'.padEnd(16)} ${Math.round(telnyxAvg)}ms`);
+    console.log(`   ${'Difference'.padEnd(16)} ${diff > 0 ? '+' : ''}${Math.round(diff)}ms (${pct}%)`);
+    if (Math.abs(diff) < 50) {
+      console.log('\n   🤝 Result: Both providers perform similarly');
+    } else if (diff < 0) {
+      console.log(`\n   🏆 Telnyx is ${Math.abs(pct)}% faster`);
+    } else {
+      console.log(`\n   🏆 ${providerName} is ${Math.abs(pct)}% faster`);
     }
     console.log('\n' + '='.repeat(80));

package/src/voice-agent-tester.js CHANGED Viewed

@@ -490,7 +490,7 @@ export class VoiceAgentTester {
     }
   }
-  async executeStep(step, stepIndex, appName = '', scenarioName = '', repetition = 1) {
+  async executeStep(step, stepIndex, appName = '', scenarioName = '', repetition = 1, scenarioStepIndex = null) {
     if (!this.page) {
       throw new Error('Browser not launched. Call launch() first.');
     }
@@ -553,13 +553,13 @@ export class VoiceAgentTester {
       // Record metrics for report if enabled and step has metrics attribute
       if (this.reportGenerator && step.metrics) {
         if (step.metrics.includes('elapsed_time')) {
-          this.reportGenerator.recordStepMetric(appName, scenarioName, repetition, stepIndex, step.action, 'elapsed_time', elapsedTimeMs);
+          this.reportGenerator.recordStepMetric(appName, scenarioName, repetition, stepIndex, step.action, 'elapsed_time', elapsedTimeMs, scenarioStepIndex);
         }
         // Record any additional metrics returned by the handler
         if (handlerResult && typeof handlerResult === 'object') {
           for (const [metricName, metricValue] of Object.entries(handlerResult)) {
             if (step.metrics.includes(metricName)) {
-              this.reportGenerator.recordStepMetric(appName, scenarioName, repetition, stepIndex, step.action, metricName, metricValue);
+              this.reportGenerator.recordStepMetric(appName, scenarioName, repetition, stepIndex, step.action, metricName, metricValue, scenarioStepIndex);
             }
           }
         }
@@ -1266,10 +1266,14 @@ export class VoiceAgentTester {
       }
       // Execute all configured steps
+      const appStepCount = appSteps.length;
       for (let i = 0; i < steps.length; i++) {
         const step = steps[i];
         console.log(`Executing step ${i + 1}: ${JSON.stringify(step)}`);
-        await this.executeStep(step, i, appName, scenarioName, repetition);
+        // For scenario steps (after app steps), pass the 1-based scenario step index
+        // so metrics can be aligned across providers with different app setup steps
+        const scenarioStepIndex = i >= appStepCount ? (i - appStepCount + 1) : null;
+        await this.executeStep(step, i, appName, scenarioName, repetition, scenarioStepIndex);
       }
       // Keep the browser open for a bit after all steps

package/tests/voice-agent-tester.test.js CHANGED Viewed

@@ -1,5 +1,6 @@
 import { describe, test, expect, beforeEach, afterEach } from '@jest/globals';
 import { VoiceAgentTester } from '../src/voice-agent-tester.js';
+import { ReportGenerator } from '../src/report.js';
 import fs from 'fs';
 import path from 'path';
@@ -187,4 +188,136 @@ describe('VoiceAgentTester', () => {
     await expect(tester.executeStep({ action: 'speak' }, 0, 'scenario'))
       .rejects.toThrow('No text or file specified for speak action');
   });
+});
+describe('ReportGenerator - Comparison Step Alignment', () => {
+  test('should align metrics by scenario step index across providers with different app steps', () => {
+    // Simulate: Vapi has 5 app steps, Telnyx has 3 app steps
+    // Both share the same 7 scenario steps with metrics on scenario steps 4 and 7
+    const providerReport = new ReportGenerator('/tmp/test_provider.csv');
+    const telnyxReport = new ReportGenerator('/tmp/test_telnyx.csv');
+    // Provider (Vapi): 5 app steps + 7 scenario steps = 12 total
+    // Metric steps at absolute indices 8 (scenario step 4) and 11 (scenario step 7)
+    providerReport.beginRun('vapi', 'appointment', 0);
+    providerReport.recordStepMetric('vapi', 'appointment', 0, 8, 'wait_for_voice', 'elapsed_time', 2849, 4);
+    providerReport.recordStepMetric('vapi', 'appointment', 0, 11, 'wait_for_voice', 'elapsed_time', 3307, 7);
+    providerReport.endRun('vapi', 'appointment', 0);
+    // Telnyx: 3 app steps + 7 scenario steps = 10 total
+    // Metric steps at absolute indices 6 (scenario step 4) and 9 (scenario step 7)
+    telnyxReport.beginRun('telnyx', 'appointment', 0);
+    telnyxReport.recordStepMetric('telnyx', 'appointment', 0, 6, 'wait_for_voice', 'elapsed_time', 1552, 4);
+    telnyxReport.recordStepMetric('telnyx', 'appointment', 0, 9, 'wait_for_voice', 'elapsed_time', 704, 7);
+    telnyxReport.endRun('telnyx', 'appointment', 0);
+    // Get scenario-aligned metrics
+    const providerMetrics = providerReport.getAggregatedMetricsByScenarioStep();
+    const telnyxMetrics = telnyxReport.getAggregatedMetricsByScenarioStep();
+    // Both should have metrics at scenario steps 4 and 7
+    expect(providerMetrics.has(4)).toBe(true);
+    expect(providerMetrics.has(7)).toBe(true);
+    expect(telnyxMetrics.has(4)).toBe(true);
+    expect(telnyxMetrics.has(7)).toBe(true);
+    // Verify values are correct
+    expect(providerMetrics.get(4).get('elapsed_time').avg).toBe(2849);
+    expect(providerMetrics.get(7).get('elapsed_time').avg).toBe(3307);
+    expect(telnyxMetrics.get(4).get('elapsed_time').avg).toBe(1552);
+    expect(telnyxMetrics.get(7).get('elapsed_time').avg).toBe(704);
+    // The comparison should now have 2 comparable steps (not 4 separate unmatched ones)
+    const allScenarioSteps = new Set([
+      ...providerMetrics.keys(),
+      ...telnyxMetrics.keys()
+    ]);
+    expect(allScenarioSteps.size).toBe(2);
+  });
+  test('should generate comparison summary with single headline number', () => {
+    const providerReport = new ReportGenerator('/tmp/test_provider.csv');
+    const telnyxReport = new ReportGenerator('/tmp/test_telnyx.csv');
+    providerReport.beginRun('vapi', 'appointment', 0);
+    providerReport.recordStepMetric('vapi', 'appointment', 0, 8, 'wait_for_voice', 'elapsed_time', 2849, 4);
+    providerReport.recordStepMetric('vapi', 'appointment', 0, 11, 'wait_for_voice', 'elapsed_time', 3307, 7);
+    providerReport.endRun('vapi', 'appointment', 0);
+    telnyxReport.beginRun('telnyx', 'appointment', 0);
+    telnyxReport.recordStepMetric('telnyx', 'appointment', 0, 6, 'wait_for_voice', 'elapsed_time', 1552, 4);
+    telnyxReport.recordStepMetric('telnyx', 'appointment', 0, 9, 'wait_for_voice', 'elapsed_time', 704, 7);
+    telnyxReport.endRun('telnyx', 'appointment', 0);
+    // Capture console output
+    const logs = [];
+    const originalLog = console.log;
+    console.log = (msg) => logs.push(msg);
+    ReportGenerator.generateComparisonSummary(providerReport, telnyxReport, 'vapi');
+    console.log = originalLog;
+    const output = logs.join('\n');
+    // Should show averaged headline numbers: vapi avg = (2849+3307)/2 = 3078, telnyx avg = (1552+704)/2 = 1128
+    expect(output).toContain('3078ms');
+    expect(output).toContain('1128ms');
+    // Should show "2 matched responses"
+    expect(output).toContain('2 matched responses');
+    // Should declare Telnyx the winner
+    expect(output).toContain('🏆 Telnyx');
+    // Should NOT contain per-response breakdown without debug
+    expect(output).not.toContain('Per-response breakdown');
+    expect(output).not.toContain('#1');
+    expect(output).not.toContain('#2');
+  });
+  test('should show per-response breakdown with debug flag', () => {
+    const providerReport = new ReportGenerator('/tmp/test_provider.csv');
+    const telnyxReport = new ReportGenerator('/tmp/test_telnyx.csv');
+    providerReport.beginRun('vapi', 'appointment', 0);
+    providerReport.recordStepMetric('vapi', 'appointment', 0, 8, 'wait_for_voice', 'elapsed_time', 2849, 4);
+    providerReport.recordStepMetric('vapi', 'appointment', 0, 11, 'wait_for_voice', 'elapsed_time', 3307, 7);
+    providerReport.endRun('vapi', 'appointment', 0);
+    telnyxReport.beginRun('telnyx', 'appointment', 0);
+    telnyxReport.recordStepMetric('telnyx', 'appointment', 0, 6, 'wait_for_voice', 'elapsed_time', 1552, 4);
+    telnyxReport.recordStepMetric('telnyx', 'appointment', 0, 9, 'wait_for_voice', 'elapsed_time', 704, 7);
+    telnyxReport.endRun('telnyx', 'appointment', 0);
+    const logs = [];
+    const originalLog = console.log;
+    console.log = (msg) => logs.push(msg);
+    ReportGenerator.generateComparisonSummary(providerReport, telnyxReport, 'vapi', { debug: true });
+    console.log = originalLog;
+    const output = logs.join('\n');
+    // Should contain per-response breakdown
+    expect(output).toContain('Per-response breakdown');
+    expect(output).toContain('#1');
+    expect(output).toContain('#2');
+    expect(output).toContain('2849ms');
+    expect(output).toContain('1552ms');
+    expect(output).toContain('3307ms');
+    expect(output).toContain('704ms');
+    // Should ALSO contain the headline average
+    expect(output).toContain('3078ms');
+    expect(output).toContain('1128ms');
+  });
+  test('getAggregatedMetricsByScenarioStep returns empty map when no scenario steps', () => {
+    const report = new ReportGenerator('/tmp/test.csv');
+    report.beginRun('test', 'scenario', 0);
+    // Record without scenarioStepIndex (app step)
+    report.recordStepMetric('test', 'scenario', 0, 0, 'click', 'elapsed_time', 100);
+    report.endRun('test', 'scenario', 0);
+    const metrics = report.getAggregatedMetricsByScenarioStep();
+    expect(metrics.size).toBe(0);
+  });
 });