npm - @telnyx/voice-agent-tester - Versions diffs - 0.3.0 → 0.4.1 - Mend

@telnyx/voice-agent-tester 0.3.0 → 0.4.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (26) hide show

package/.agent/workflows/ralph-loop.md +62 -0
package/.gemini/skills/ralph-loop/SKILL.md +240 -0
package/.github/workflows/draft-release.yml +39 -4
package/.github/workflows/publish-release.yml +2 -2
package/CHANGELOG.md +19 -0
package/README.md +99 -2
package/{benchmarks/applications → applications}/elevenlabs.yaml +5 -2
package/applications/livetok.yaml +16 -0
package/{benchmarks/applications → applications}/telnyx.yaml +1 -1
package/applications/vapi.yaml +19 -0
package/assets/appointment_data_with_noise.mp3 +0 -0
package/assets/hello_make_an_appointment_with_noise.mp3 +0 -0
package/javascript/audio_input_hooks.js +104 -0
package/package.json +1 -1
package/{benchmarks/scenarios → scenarios}/appointment.yaml +0 -2
package/scenarios/appointment_with_noise.yaml +17 -0
package/src/index.js +92 -6
package/src/provider-import.js +1 -1
package/src/voice-agent-tester.js +53 -3
package/assets/confirmation.mp3 +0 -0
package/assets/greet_me_angry.mp3 +0 -0
package/assets/name_lebron_james.mp3 +0 -0
package/assets/tell_me_joke_laugh.mp3 +0 -0
package/assets/tell_me_something_funny.mp3 +0 -0
package/assets/tell_me_something_sad.mp3 +0 -0
package/benchmarks/applications/vapi.yaml +0 -10

package/.agent/workflows/ralph-loop.md ADDED Viewed

@@ -0,0 +1,62 @@
+---
+description: Ralph Loop - Iterative AI development with persistent iteration until task completion
+---
+# Ralph Loop Workflow
+This workflow implements the Ralph Loop (Ralph Wiggum) technique for iterative, autonomous coding.
+## Usage
+Invoke with: `/ralph-loop <task description>`
+Or provide detailed options:
+```
+/ralph-loop "Build feature X" --max-iterations 30 --completion-promise "COMPLETE"
+```
+## Workflow Steps
+1. **Read the Ralph Loop skill instructions**
+   - View the skill file at `.gemini/skills/ralph-loop/SKILL.md`
+   - Understand the iteration pattern and best practices
+2. **Parse the user's task**
+   - Identify the main objective
+   - Extract success criteria
+   - Set max iterations (default: 30)
+   - Set completion promise (default: "COMPLETE")
+3. **Enter the loop**
+   - Execute the task iteratively
+   - Self-correct on failures
+   - Track progress
+   - Continue until success criteria met or max iterations reached
+4. **Report completion**
+   - Summarize accomplishments
+   - Output the completion promise
+   - List any remaining issues
+## Quick Commands
+- **Start a loop**: `/ralph-loop "Your task here"`
+- **Cancel loop**: Say "stop", "cancel", or "abort"
+- **Check skill docs**: View `.gemini/skills/ralph-loop/SKILL.md`
+## Examples
+### Feature Implementation
+```
+/ralph-loop "Implement user authentication with JWT tokens. Requirements: login/logout endpoints, password hashing, token refresh. Tests must pass."
+```
+### Bug Fix
+```
+/ralph-loop "Fix the 404 error when importing VAPI assistants. Add retry logic with exponential backoff."
+```
+### Refactoring
+```
+/ralph-loop "Refactor the CLI options to be more provider-agnostic. All existing tests must pass."
+```

package/.gemini/skills/ralph-loop/SKILL.md ADDED Viewed

@@ -0,0 +1,240 @@
+---
+name: ralph-loop
+description: Ralph Loop - AI Loop Technique for iterative, autonomous coding. Implements persistent iteration until task completion with self-correction patterns.
+---
+# Ralph Loop - AI Loop Technique
+The Ralph Loop (also known as "Ralph Wiggum") is an iterative AI development methodology. It embodies the philosophy of **persistent iteration despite setbacks**.
+## Core Philosophy
+1. **Iteration > Perfection**: Don't aim for perfect on first try. Let the loop refine the work.
+2. **Failures Are Data**: Deterministically bad means failures are predictable and informative.
+3. **Operator Skill Matters**: Success depends on writing good prompts, not just having a good model.
+4. **Persistence Wins**: Keep trying until success. Handle retry logic automatically.
+---
+## How to Use This Skill
+When the user invokes this skill (e.g., `/ralph-loop` or asks for iterative development), follow these instructions:
+### Step 1: Understand the Task
+Parse the user's request and identify:
+- **The main objective** - What needs to be built/fixed/refactored
+- **Success criteria** - How to know when it's complete
+- **Max iterations** - Safety limit (default: 30)
+- **Completion promise** - The signal word (default: "COMPLETE")
+### Step 2: Enter the Ralph Loop
+Execute the following loop pattern:
+```
+ITERATION = 1
+MAX_ITERATIONS = [specified or 30]
+COMPLETION_PROMISE = [specified or "COMPLETE"]
+WHILE (ITERATION <= MAX_ITERATIONS) AND (NOT COMPLETED):
+    1. Assess current state
+    2. Identify next step toward goal
+    3. Execute the step (write code, run tests, fix bugs, etc.)
+    4. Evaluate results
+    5. If success criteria met → output COMPLETION_PROMISE → EXIT LOOP
+    6. If not complete → increment ITERATION → CONTINUE
+    7. If blocked → document issue → try alternative approach
+END WHILE
+IF MAX_ITERATIONS reached without completion:
+    - Document what was accomplished
+    - List blocking issues
+    - Suggest next steps
+```
+### Step 3: Self-Correction Pattern
+During each iteration, follow this TDD-inspired pattern:
+1. **Plan** - Identify what needs to happen next
+2. **Execute** - Make the change (code, config, etc.)
+3. **Verify** - Run tests, check results, validate
+4. **If failing** - Debug and fix in the same iteration if possible
+5. **If passing** - Move to next requirement
+6. **Refactor** - Clean up if needed before proceeding
+### Step 4: Report Progress
+After each significant iteration, briefly report:
+- Current iteration number
+- What was attempted
+- Result (success/failure/partial)
+- Next step
+### Step 5: Completion
+When all success criteria are met:
+1. Summarize what was accomplished
+2. List any tests/validations that passed
+3. Output the completion promise: `<promise>COMPLETE</promise>`
+---
+## Prompt Templates
+### Feature Implementation
+```
+Implement [FEATURE_NAME].
+Requirements:
+- [Requirement 1]
+- [Requirement 2]
+- [Requirement 3]
+Success criteria:
+- All requirements implemented
+- Tests passing with >80% coverage
+- No linter errors
+- Documentation updated
+Output <promise>COMPLETE</promise> when done.
+```
+### TDD Development
+```
+Implement [FEATURE] using TDD.
+Process:
+1. Write failing test for next requirement
+2. Implement minimal code to pass
+3. Run tests
+4. If failing, fix and retry
+5. Refactor if needed
+6. Repeat for all requirements
+Requirements: [LIST]
+Output <promise>DONE</promise> when all tests green.
+```
+### Bug Fixing
+```
+Fix bug: [DESCRIPTION]
+Steps:
+1. Reproduce the bug
+2. Identify root cause
+3. Implement fix
+4. Write regression test
+5. Verify fix works
+6. Check no new issues introduced
+After 15 iterations if not fixed:
+- Document blocking issues
+- List attempted approaches
+- Suggest alternatives
+Output <promise>FIXED</promise> when resolved.
+```
+### Refactoring
+```
+Refactor [COMPONENT] for [GOAL].
+Constraints:
+- All existing tests must pass
+- No behavior changes
+- Incremental commits
+Checklist:
+- [ ] Tests passing before start
+- [ ] Apply refactoring step
+- [ ] Tests still passing
+- [ ] Repeat until done
+Output <promise>REFACTORED</promise> when complete.
+```
+---
+## Advanced Patterns
+### Multi-Phase Development
+For complex projects, chain multiple loops:
+```
+Phase 1: Core implementation → <promise>PHASE1_DONE</promise>
+Phase 2: API layer → <promise>PHASE2_DONE</promise>
+Phase 3: Frontend → <promise>PHASE3_DONE</promise>
+```
+### Incremental Goals
+Break large tasks into phases:
+```
+Phase 1: User authentication (JWT, tests)
+Phase 2: Product catalog (list/search, tests)
+Phase 3: Shopping cart (add/remove, tests)
+Output <promise>COMPLETE</promise> when all phases done.
+```
+---
+## Best Practices for Writing Prompts
+### ❌ Bad Prompt
+```
+Build a todo API and make it good.
+```
+### ✅ Good Prompt
+```
+Build a REST API for todos.
+When complete:
+- All CRUD endpoints working
+- Input validation in place
+- Tests passing (coverage > 80%)
+- README with API docs
+Output: <promise>COMPLETE</promise>
+```
+---
+## When to Use Ralph Loop
+### ✅ Good For:
+- Feature implementation with clear requirements
+- Bug fixing with reproducible issues
+- Refactoring with existing test coverage
+- TDD-style development
+- Tasks that benefit from iteration
+### ❌ Not Good For:
+- Exploratory research without clear goals
+- Tasks requiring human judgment at each step
+- Real-time interactive sessions
+- Tasks with no verifiable success criteria
+---
+## Cancellation
+The user can cancel the loop at any time by:
+- Saying "stop", "cancel", or "abort"
+- Providing new instructions that supersede the current task
+---
+## Attribution
+Based on the Ralph Wiggum technique from [Awesome Claude](https://awesomeclaude.ai/ralph-wiggum) and the official Claude plugins marketplace (`ralph-loop@claude-plugins-official`).

package/.github/workflows/draft-release.yml CHANGED Viewed

@@ -46,8 +46,8 @@ jobs:
       - name: Setup Git user
         run: |
-          git config user.name TelnyxIntegrations
-          git config user.email integrations@telnyx.com
+          git config user.name "github-actions[bot]"
+          git config user.email "41898282+github-actions[bot]@users.noreply.github.com"
       - name: Use Node.js 20.x
         uses: actions/setup-node@v4
@@ -64,9 +64,44 @@ jobs:
         env:
           CI: true
-      - name: Create draft release
+      - name: Determine next version
+        id: version
         run: |
-          npx release-it --ci --github.draft --no-npm.publish${{ env.INCREMENT_ARG }}${{ env.PRERELEASE_ARGS }}
+          NEXT=$(npx release-it --ci --release-version${{ env.INCREMENT_ARG }}${{ env.PRERELEASE_ARGS }} 2>/dev/null)
+          echo "next=$NEXT" >> "$GITHUB_OUTPUT"
+          echo "branch=release/v$NEXT" >> "$GITHUB_OUTPUT"
+        env:
+          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
+      - name: Create release branch
+        run: |
+          git checkout -b "${{ steps.version.outputs.branch }}"
+      - name: Create draft release on branch
+        run: |
+          npx release-it --ci --github.draft --no-npm.publish --no-git.push --no-git.requireUpstream${{ env.INCREMENT_ARG }}${{ env.PRERELEASE_ARGS }}
         env:
           NPM_TOKEN: ${{ secrets.NPM_CI_TOKEN }}
           GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
+      - name: Push release branch
+        run: |
+          git push origin "${{ steps.version.outputs.branch }}"
+      - name: Create pull request
+        run: |
+          gh pr create \
+            --title "chore: release v${{ steps.version.outputs.next }}" \
+            --body "## Release v${{ steps.version.outputs.next }}
+          Automated release PR created by the draft-release workflow.
+          - Version bump in \`package.json\`
+          - Updated \`CHANGELOG.md\`
+          - Draft GitHub release created
+          **After merging**, publish the release from the [releases page](https://github.com/${{ github.repository }}/releases)." \
+            --base "${{ env.TARGET_REF }}" \
+            --head "${{ steps.version.outputs.branch }}"
+        env:
+          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}

package/.github/workflows/publish-release.yml CHANGED Viewed

@@ -19,8 +19,8 @@ jobs:
       - name: Setup Git user
         run: |
-          git config user.name TelnyxIntegrations
-          git config user.email integrations@telnyx.com
+          git config user.name "github-actions[bot]"
+          git config user.email "41898282+github-actions[bot]@users.noreply.github.com"
       - name: Use Node.js 20.x
         uses: actions/setup-node@v4

package/CHANGELOG.md CHANGED Viewed

@@ -1,5 +1,24 @@
 # Changelog
+## [0.4.1](https://github.com/team-telnyx/voice-agent-tester/compare/v0.4.0...v0.4.1) (2026-02-18)
+### Features
+* require provider-specific params for comparison mode ([#10](https://github.com/team-telnyx/voice-agent-tester/issues/10)) ([db9eb27](https://github.com/team-telnyx/voice-agent-tester/commit/db9eb273c139374a9f6358126113cab92f8f5b32))
+* use Qwen/Qwen3-235B-A22B as model for imported assistants ([#11](https://github.com/team-telnyx/voice-agent-tester/issues/11)) ([3c4ed0a](https://github.com/team-telnyx/voice-agent-tester/commit/3c4ed0a14498833544f1797426b234585adcb49b))
+### Bug Fixes
+* add --no-git.requireUpstream to release-it in draft workflow ([#14](https://github.com/team-telnyx/voice-agent-tester/issues/14)) ([9553e65](https://github.com/team-telnyx/voice-agent-tester/commit/9553e65bdc6f0094853895da6b806befc5a898f6))
+* use triggering user as git author and create PR for releases ([#13](https://github.com/team-telnyx/voice-agent-tester/issues/13)) ([8ebecba](https://github.com/team-telnyx/voice-agent-tester/commit/8ebecba1839985949e46bec457f327711f89138d))
+## [0.4.0](https://github.com/team-telnyx/voice-agent-tester/compare/v0.3.0...v0.4.0) (2026-01-26)
+### Features
+* add audio input from URL for benchmark runs ([c347de8](https://github.com/team-telnyx/voice-agent-tester/commit/c347de83b8318827bac098bff4328502908ee981))
+* add background noise benchmark with pre-mixed audio files ([9f64179](https://github.com/team-telnyx/voice-agent-tester/commit/9f6417936514451270c4d1bc929771446c366b08))
 ## [0.3.0](https://github.com/team-telnyx/voice-agent-tester/compare/v0.2.3...v0.3.0) (2026-01-23)
 ### Features

package/README.md CHANGED Viewed

@@ -40,10 +40,11 @@ voice-agent-tester -a applications/telnyx.yaml -s scenarios/appointment.yaml --a
 | `-c, --concurrency` | `1` | Number of parallel tests |
 | `-r, --report` | | Generate CSV report to specified file |
 | `-p, --params` | | URL template params (e.g., `key=value,key2=value2`) |
-| `--record` | `false` | Record video and audio in webm format |
 | `--application-tags` | | Filter applications by comma-separated tags |
 | `--scenario-tags` | | Filter scenarios by comma-separated tags |
 | `--assets-server` | `http://localhost:3333` | Assets server URL |
+| `--audio-url` | | URL to audio file to play as input during entire benchmark |
+| `--audio-volume` | `1.0` | Volume level for audio input (0.0 to 1.0) |
 ## Bundled Configs
@@ -55,7 +56,103 @@ voice-agent-tester -a applications/telnyx.yaml -s scenarios/appointment.yaml --a
 | `applications/retell.yaml` | Retell |
 | `applications/livetok.yaml` | Livetok |
-Scenario: `scenarios/appointment.yaml`
+Scenarios:
+- `scenarios/appointment.yaml` - Basic appointment booking test
+- `scenarios/appointment_with_noise.yaml` - Appointment with background noise (pre-mixed audio)
+## Background Noise Testing
+Test voice agents' performance with ambient noise (e.g., crowd chatter, cafe environment). Background noise is pre-mixed into audio files to simulate real-world conditions where users speak to voice agents in noisy environments.
+### Running with Background Noise
+```bash
+# Telnyx with background noise
+npx @telnyx/voice-agent-tester@latest \
+  -a applications/telnyx.yaml \
+  -s scenarios/appointment_with_noise.yaml \
+  --assistant-id <YOUR_ASSISTANT_ID>
+# Compare with no noise (same assistant)
+npx @telnyx/voice-agent-tester@latest \
+  -a applications/telnyx.yaml \
+  -s scenarios/appointment.yaml \
+  --assistant-id <YOUR_ASSISTANT_ID>
+# Generate CSV report with metrics
+npx @telnyx/voice-agent-tester@latest \
+  -a applications/telnyx.yaml \
+  -s scenarios/appointment_with_noise.yaml \
+  --assistant-id <YOUR_ASSISTANT_ID> \
+  -r output/noise_benchmark.csv
+```
+### Custom Audio Input from URL
+Play any audio file from a URL as input throughout the entire benchmark run. The audio is sent to the voice agent as microphone input.
+```bash
+# Use custom audio input from URL
+npx @telnyx/voice-agent-tester@latest \
+  -a applications/telnyx.yaml \
+  -s scenarios/appointment.yaml \
+  --assistant-id <YOUR_ASSISTANT_ID> \
+  --audio-url "https://example.com/test-audio.mp3" \
+  --audio-volume 0.8
+```
+This is useful for:
+- Testing with custom audio inputs
+- Using longer audio tracks that play throughout the benchmark
+- A/B testing different audio sources
+### Bundled Audio Files
+| File | Description |
+|------|-------------|
+| `hello_make_an_appointment.mp3` | Clean appointment request |
+| `hello_make_an_appointment_with_noise.mp3` | Appointment request with crowd noise |
+| `appointment_data.mp3` | Clean appointment details |
+| `appointment_data_with_noise.mp3` | Appointment details with crowd noise |
+### Scenario Configuration
+The noise scenario uses pre-mixed audio files:
+```yaml
+# scenarios/appointment_with_noise.yaml
+tags:
+  - default
+  - noise
+steps:
+  - action: wait_for_voice
+  - action: wait_for_silence
+  - action: sleep
+    time: 1000
+  - action: speak
+    file: hello_make_an_appointment_with_noise.mp3
+  - action: wait_for_voice
+    metrics: elapsed_time
+  - action: wait_for_silence
+  - action: speak
+    file: appointment_data_with_noise.mp3
+  - action: wait_for_voice
+    metrics: elapsed_time
+```
+### Metrics and Reports
+The benchmark collects response latency metrics at each `wait_for_voice` step with `metrics: elapsed_time`. Generated CSV reports include:
+```csv
+app, scenario, repetition, success, duration, step_9_wait_for_voice_elapsed_time, step_12_wait_for_voice_elapsed_time
+telnyx, appointment_with_noise, 0, 1, 29654, 1631, 1225
+```
+Compare results with and without noise to measure how background noise affects your voice agent's:
+- Response latency
+- Speech recognition accuracy
+- Overall conversation flow
 ## Examples

package/{benchmarks/applications → applications}/elevenlabs.yaml RENAMED Viewed

@@ -1,10 +1,13 @@
 url: "https://elevenlabs.io/app/talk-to?agent_id={{assistantId}}&branch_id={{branchId}}"
+tags:
+  - provider
+  - elevenlabs
 steps:
   - action: wait_for_element
-    selector: "button[data-agent-id]"
+    selector: "text=Call AI agent"
   - action: sleep
     time: 3000
   - action: click
-    selector: "button[data-agent-id]"
+    selector: "text=Call AI agent"
   - action: sleep
     time: 2000

package/applications/livetok.yaml ADDED Viewed

@@ -0,0 +1,16 @@
+url: "https://rti.livetok.io/demo/index.html"
+tags:
+  - default
+  - basic
+steps:
+  - action: fill
+    selector: "input[type='password']"
+    text: "GOOGLE_API_KEY HERE"
+  # - action: select
+  #   selector: "#model"
+  #   value: "gemini-2.5-flash-preview-native-audio-dialog"
+  # - action: fill
+  #   selector: "#tools"
+  #   text: "[]"
+  - action: click
+    selector: "#start"

package/{benchmarks/applications → applications}/telnyx.yaml RENAMED Viewed

@@ -5,6 +5,6 @@ steps:
   - action: sleep
     time: 3000
   - action: click
-    selector: "telnyx-ai-agent"
+    selector: "telnyx-ai-agent >>> button"
   - action: sleep
     time: 4000

package/applications/vapi.yaml ADDED Viewed

@@ -0,0 +1,19 @@
+url: "https://vapi.ai?demo=true&shareKey={{shareKey}}&assistantId={{assistantId}}"
+tags:
+  - provider
+  - vapi
+steps:
+  - action: wait_for_element
+    selector: "button[aria-label=\"Talk to Vapi\"]"
+  - action: sleep
+    time: 5000
+  - action: click
+    selector: "button[aria-label=\"Talk to Vapi\"]"
+  - action: sleep
+    time: 2000
+  - action: speak
+    text: "Hello, what can you do?"
+  - action: wait_for_voice
+    metrics: elapsed_time
+  - action: wait_for_silence
+    metrics: elapsed_time

package/assets/appointment_data_with_noise.mp3 ADDED Viewed

Binary file

package/assets/hello_make_an_appointment_with_noise.mp3 ADDED Viewed

Binary file

package/javascript/audio_input_hooks.js CHANGED Viewed

@@ -289,3 +289,107 @@ window.__waitForMediaStream = function (timeout = 10000) {
   });
 };
+// ============= AUDIO INPUT FROM URL =============
+// For playing audio from a URL as input during the entire benchmark
+let urlAudioElement = null;
+let urlAudioSourceNode = null;
+let urlAudioGainNode = null;
+// Start playing audio from URL (sent as microphone input)
+window.__startAudioFromUrl = function (url, volume = 1.0) {
+  console.log(`🔊 Starting audio from URL: ${url} (volume: ${volume})`);
+  if (!globalAudioContext) {
+    console.error('AudioContext not initialized');
+    return Promise.reject(new Error('AudioContext not initialized'));
+  }
+  // Stop any existing URL audio
+  window.__stopAudioFromUrl();
+  return new Promise((resolve, reject) => {
+    urlAudioElement = new Audio(url);
+    urlAudioElement.crossOrigin = 'anonymous';
+    urlAudioElement.loop = true;
+    urlAudioElement.addEventListener('canplaythrough', function onCanPlay() {
+      urlAudioElement.removeEventListener('canplaythrough', onCanPlay);
+      try {
+        // Create media element source
+        urlAudioSourceNode = globalAudioContext.createMediaElementSource(urlAudioElement);
+        // Create gain node for volume control
+        urlAudioGainNode = globalAudioContext.createGain();
+        urlAudioGainNode.gain.setValueAtTime(volume, globalAudioContext.currentTime);
+        // Connect: source -> gain -> all MediaStreams
+        urlAudioSourceNode.connect(urlAudioGainNode);
+        // Connect to all MediaStream gain nodes (sent as microphone input)
+        mediaStreams.forEach((streamData) => {
+          urlAudioGainNode.connect(streamData.gainNode);
+          console.log(`🔊 Connected URL audio to stream ${streamData.id}`);
+        });
+        // Also make audible through speakers if speak audio is audible
+        if (MAKE_SPEAK_AUDIO_AUDIBLE) {
+          urlAudioGainNode.connect(globalAudioContext.destination);
+        }
+        // Start playing
+        urlAudioElement.play().then(() => {
+          console.log('🔊 Audio from URL started playing');
+          if (typeof __publishEvent === 'function') {
+            __publishEvent('urlaudiostart', { url: url, volume: volume });
+          }
+          resolve();
+        }).catch(reject);
+      } catch (error) {
+        console.error('Error setting up audio from URL:', error);
+        reject(error);
+      }
+    });
+    urlAudioElement.addEventListener('error', function (event) {
+      console.error('URL audio error:', event);
+      reject(new Error('Failed to load audio from URL'));
+    });
+    urlAudioElement.load();
+  });
+};
+// Stop audio from URL
+window.__stopAudioFromUrl = function () {
+  if (urlAudioElement) {
+    console.log('🔊 Stopping audio from URL');
+    urlAudioElement.pause();
+    urlAudioElement.currentTime = 0;
+    urlAudioElement = null;
+  }
+  if (urlAudioSourceNode) {
+    try {
+      urlAudioSourceNode.disconnect();
+    } catch (e) {
+      // Already disconnected
+    }
+    urlAudioSourceNode = null;
+  }
+  if (urlAudioGainNode) {
+    try {
+      urlAudioGainNode.disconnect();
+    } catch (e) {
+      // Already disconnected
+    }
+    urlAudioGainNode = null;
+  }
+  if (typeof __publishEvent === 'function') {
+    __publishEvent('urlaudiostop', {});
+  }
+};

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "@telnyx/voice-agent-tester",
-  "version": "0.3.0",
+  "version": "0.4.1",
   "description": "A command-line tool to test voice agents using Puppeteer",
   "main": "src/index.js",
   "type": "module",

package/{benchmarks/scenarios → scenarios}/appointment.yaml RENAMED Viewed

@@ -3,8 +3,6 @@ tags:
 steps:
   - action: wait_for_voice
   - action: wait_for_silence
-  - action: sleep
-    time: 1000
   - action: speak
     file: hello_make_an_appointment.mp3
   - action: wait_for_voice

package/scenarios/appointment_with_noise.yaml ADDED Viewed

@@ -0,0 +1,17 @@
+tags:
+  - default
+  - noise
+steps:
+  - action: wait_for_voice
+  - action: wait_for_silence
+  - action: sleep
+    time: 1000
+  - action: speak
+    file: hello_make_an_appointment_with_noise.mp3
+  - action: wait_for_voice
+    metrics: elapsed_time
+  - action: wait_for_silence
+  - action: speak
+    file: appointment_data_with_noise.mp3
+  - action: wait_for_voice
+    metrics: elapsed_time

package/src/index.js CHANGED Viewed

@@ -87,6 +87,50 @@ function substituteUrlParams(url, params) {
   return result;
 }
+/**
+ * Get the list of missing provider-specific parameters required for comparison mode.
+ * Each provider has its own set of required params for the direct widget benchmark.
+ *
+ * @param {Object} argv - Parsed CLI arguments
+ * @returns {Array<{key: string, flag: string, description: string}>} Missing params
+ */
+function getCompareRequiredParams(argv) {
+  const missing = [];
+  switch (argv.provider) {
+    case 'vapi':
+      if (!argv.shareKey) {
+        missing.push({ key: 'shareKey', flag: '--share-key', description: 'Vapi share key' });
+      }
+      break;
+    case 'elevenlabs':
+      if (!argv.branchId) {
+        missing.push({ key: 'branchId', flag: '--branch-id', description: 'ElevenLabs branch ID' });
+      }
+      break;
+    // retell and others: no extra params needed yet
+  }
+  return missing;
+}
+/**
+ * Get provider-specific template parameters for comparison mode URL/HTML substitution.
+ *
+ * @param {Object} argv - Parsed CLI arguments
+ * @returns {Object} Template params to merge into provider params
+ */
+function getCompareTemplateParams(argv) {
+  switch (argv.provider) {
+    case 'vapi':
+      return { shareKey: argv.shareKey };
+    case 'elevenlabs':
+      return { branchId: argv.branchId };
+    default:
+      return {};
+  }
+}
 // Helper function to load and validate application config
 function loadApplicationConfig(configPath, params = {}) {
   const configFile = fs.readFileSync(configPath, 'utf8');
@@ -118,7 +162,6 @@ function loadScenarioConfig(configPath) {
     name: path.basename(configPath, path.extname(configPath)),
     path: configPath,
     steps: config.steps || [],
-    background: config.background || null,
     tags: config.tags || []
   };
 }
@@ -236,6 +279,14 @@ const argv = yargs(hideBin(process.argv))
     type: 'string',
     description: 'Provider assistant/agent ID to import (required with --provider)'
   })
+  .option('share-key', {
+    type: 'string',
+    description: 'Vapi share key for direct widget testing (required for comparison mode with --provider vapi)'
+  })
+  .option('branch-id', {
+    type: 'string',
+    description: 'ElevenLabs branch ID for direct widget testing (required for comparison mode with --provider elevenlabs)'
+  })
   .option('assistant-id', {
     type: 'string',
     description: 'Assistant/agent ID for direct benchmarking (works with all providers)'
@@ -256,6 +307,16 @@ const argv = yargs(hideBin(process.argv))
     description: 'Disable comparison benchmarks (run only Telnyx import)',
     default: false
   })
+  .option('audio-url', {
+    type: 'string',
+    description: 'URL to audio file to play as input during entire benchmark run',
+    default: null
+  })
+  .option('audio-volume', {
+    type: 'number',
+    description: 'Volume level for audio input (0.0 to 1.0)',
+    default: 1.0
+  })
   .help()
   .argv;
@@ -333,11 +394,13 @@ async function runBenchmark({ applications, scenarios, repeat, concurrency, argv
       assetsServerUrl: argv.assetsServer,
       reportGenerator: reportGenerator,
       record: argv.record,
-      debug: argv.debug
+      debug: argv.debug,
+      audioUrl: argv.audioUrl,
+      audioVolume: argv.audioVolume
     });
     try {
-      await tester.runScenario(targetUrl, app.steps, scenario.steps, app.name, scenario.name, repetition, scenario.background);
+      await tester.runScenario(targetUrl, app.steps, scenario.steps, app.name, scenario.name, repetition);
       console.log(`✅ Completed successfully (Run ${runNumber}/${totalRuns})`);
       return { success: true };
     } catch (error) {
@@ -445,8 +508,8 @@ async function main() {
     // Parse URL parameters for template substitution
     const params = parseParams(argv.params);
-    // Determine if we should run comparison benchmark
-    const shouldCompare = argv.provider && argv.compare && !argv.noCompare;
+    // Determine if we should run comparison benchmark (may be updated later if public key is missing)
+    let shouldCompare = argv.provider && argv.compare && !argv.noCompare;
     // Store credentials for potential comparison run
     let telnyxApiKey = argv.apiKey;
@@ -481,6 +544,29 @@ async function main() {
         }
       }
+      // Require provider-specific params when comparison mode is enabled
+      if (shouldCompare) {
+        const missingParams = getCompareRequiredParams(argv);
+        if (missingParams.length > 0) {
+          for (const param of missingParams) {
+            console.log(`\n🔑 ${param.description} is required for comparison mode`);
+            const inputVal = await promptUserInput(`Enter ${param.description} (or press Enter to skip comparison): `);
+            if (inputVal) {
+              argv[param.key] = inputVal;
+            } else {
+              console.warn(`⚠️  Missing ${param.flag}. Disabling comparison mode (--no-compare).`);
+              console.warn(`   To run comparison benchmarks, pass ${param.flag} <value>\n`);
+              argv.compare = false;
+              argv.noCompare = true;
+              break;
+            }
+          }
+        }
+      }
+      // Re-evaluate shouldCompare after potential public key prompt
+      shouldCompare = argv.provider && argv.compare && !argv.noCompare;
       const importResult = await importAssistantsFromProvider({
         provider: argv.provider,
         providerApiKey: providerApiKey,
@@ -572,7 +658,7 @@ async function main() {
       // Phase 1: Provider Direct Benchmark
       // Load provider-specific application config with provider assistant ID
-      const providerParams = { ...params, assistantId: providerImportId };
+      const providerParams = { ...params, assistantId: providerImportId, ...getCompareTemplateParams(argv) };
       const providerAppPath = path.resolve(__packageDir, 'applications', `${argv.provider}.yaml`);
       if (!fs.existsSync(providerAppPath)) {

package/src/provider-import.js CHANGED Viewed

@@ -200,7 +200,7 @@ async function configureImportedAssistant({ assistantId, assistantName, telnyxAp
         },
         body: JSON.stringify({
           name: newName,
-          model: 'Qwen/Qwen3-235B-A22',
+          model: 'Qwen/Qwen3-235B-A22B',
           telephony_settings: {
             supports_unauthenticated_web_calls: true
           },

package/src/voice-agent-tester.js CHANGED Viewed

@@ -24,6 +24,8 @@ export class VoiceAgentTester {
     this.record = options.record || false;
     this.recordingStream = null;
     this.recordingFile = null;
+    this.audioUrl = options.audioUrl || null;
+    this.audioVolume = options.audioVolume || 1.0;
   }
   sleep(time) {
@@ -951,8 +953,6 @@ export class VoiceAgentTester {
     return screenshotPath;
   }
   async saveAudioAsWAV(base64Audio, audioMetadata) {
     try {
       // Convert base64 to buffer
@@ -978,7 +978,43 @@ export class VoiceAgentTester {
     }
   }
-  async runScenario(url, appSteps, scenarioSteps, appName = '', scenarioName = '', repetition = 1, backgroundFile = null) {
+  async startAudioFromUrl(audioUrl, volume = 1.0) {
+    console.log(`🔊 Starting audio from URL: ${audioUrl} (volume: ${volume})`);
+    try {
+      await this.page.evaluate(async (url, vol) => {
+        // Wait for media stream to be ready
+        if (typeof window.__waitForMediaStream === 'function') {
+          await window.__waitForMediaStream();
+        }
+        if (typeof window.__startAudioFromUrl === 'function') {
+          await window.__startAudioFromUrl(url, vol);
+        } else {
+          throw new Error('__startAudioFromUrl not available in browser context');
+        }
+      }, audioUrl, volume);
+      console.log(`🔊 Audio from URL started successfully`);
+    } catch (error) {
+      console.warn(`⚠️ Failed to start audio from URL: ${error.message}`);
+    }
+  }
+  async stopAudioFromUrl() {
+    try {
+      await this.page.evaluate(() => {
+        if (typeof window.__stopAudioFromUrl === 'function') {
+          window.__stopAudioFromUrl();
+        }
+      });
+      console.log(`🔊 Audio from URL stopped`);
+    } catch (error) {
+      // Ignore errors when stopping (page might be closed)
+    }
+  }
+  async runScenario(url, appSteps, scenarioSteps, appName = '', scenarioName = '', repetition = 1) {
     let success = true;
     try {
       // Start tracking this run with app and scenario names
@@ -1004,6 +1040,11 @@ export class VoiceAgentTester {
       // Start recording if enabled
       await this.startRecording(appName, scenarioName, repetition);
+      // Start audio from URL if specified via CLI
+      if (this.audioUrl) {
+        await this.startAudioFromUrl(this.audioUrl, this.audioVolume);
+      }
       // Execute all configured steps
       for (let i = 0; i < steps.length; i++) {
         const step = steps[i];
@@ -1022,6 +1063,15 @@ export class VoiceAgentTester {
       console.error(`Error during scenario execution: ${shortMessage}`);
       throw error;
     } finally {
+      // Stop audio from URL if it was started
+      if (this.audioUrl && this.page) {
+        try {
+          await this.stopAudioFromUrl();
+        } catch (e) {
+          // Page might already be closed
+        }
+      }
       // Always finish the run for report generation, even if there was an error
       if (this.reportGenerator) {
         this.reportGenerator.endRun(appName, scenarioName, repetition, success);

package/assets/confirmation.mp3 DELETED Viewed

Binary file

package/assets/greet_me_angry.mp3 DELETED Viewed

Binary file

package/assets/name_lebron_james.mp3 DELETED Viewed

Binary file

package/assets/tell_me_joke_laugh.mp3 DELETED Viewed

Binary file

package/assets/tell_me_something_funny.mp3 DELETED Viewed

Binary file

package/assets/tell_me_something_sad.mp3 DELETED Viewed

Binary file

package/benchmarks/applications/vapi.yaml DELETED Viewed

@@ -1,10 +0,0 @@
-url: "https://vapi.ai/?demo=true&shareKey={{shareKey}}&assistantId={{assistantId}}"
-steps:
-  - action: wait_for_element
-    selector: "button[aria-label='Talk to Vapi']"
-  - action: sleep
-    time: 3000
-  - action: click
-    selector: "button[aria-label='Talk to Vapi']"
-  - action: sleep
-    time: 2000