@telnyx/voice-agent-tester 0.2.3 → 0.4.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,62 @@
1
+ ---
2
+ description: Ralph Loop - Iterative AI development with persistent iteration until task completion
3
+ ---
4
+
5
+ # Ralph Loop Workflow
6
+
7
+ This workflow implements the Ralph Loop (Ralph Wiggum) technique for iterative, autonomous coding.
8
+
9
+ ## Usage
10
+
11
+ Invoke with: `/ralph-loop <task description>`
12
+
13
+ Or provide detailed options:
14
+ ```
15
+ /ralph-loop "Build feature X" --max-iterations 30 --completion-promise "COMPLETE"
16
+ ```
17
+
18
+ ## Workflow Steps
19
+
20
+ 1. **Read the Ralph Loop skill instructions**
21
+ - View the skill file at `.gemini/skills/ralph-loop/SKILL.md`
22
+ - Understand the iteration pattern and best practices
23
+
24
+ 2. **Parse the user's task**
25
+ - Identify the main objective
26
+ - Extract success criteria
27
+ - Set max iterations (default: 30)
28
+ - Set completion promise (default: "COMPLETE")
29
+
30
+ 3. **Enter the loop**
31
+ - Execute the task iteratively
32
+ - Self-correct on failures
33
+ - Track progress
34
+ - Continue until success criteria met or max iterations reached
35
+
36
+ 4. **Report completion**
37
+ - Summarize accomplishments
38
+ - Output the completion promise
39
+ - List any remaining issues
40
+
41
+ ## Quick Commands
42
+
43
+ - **Start a loop**: `/ralph-loop "Your task here"`
44
+ - **Cancel loop**: Say "stop", "cancel", or "abort"
45
+ - **Check skill docs**: View `.gemini/skills/ralph-loop/SKILL.md`
46
+
47
+ ## Examples
48
+
49
+ ### Feature Implementation
50
+ ```
51
+ /ralph-loop "Implement user authentication with JWT tokens. Requirements: login/logout endpoints, password hashing, token refresh. Tests must pass."
52
+ ```
53
+
54
+ ### Bug Fix
55
+ ```
56
+ /ralph-loop "Fix the 404 error when importing VAPI assistants. Add retry logic with exponential backoff."
57
+ ```
58
+
59
+ ### Refactoring
60
+ ```
61
+ /ralph-loop "Refactor the CLI options to be more provider-agnostic. All existing tests must pass."
62
+ ```
@@ -0,0 +1,240 @@
1
+ ---
2
+ name: ralph-loop
3
+ description: Ralph Loop - AI Loop Technique for iterative, autonomous coding. Implements persistent iteration until task completion with self-correction patterns.
4
+ ---
5
+
6
+ # Ralph Loop - AI Loop Technique
7
+
8
+ The Ralph Loop (also known as "Ralph Wiggum") is an iterative AI development methodology. It embodies the philosophy of **persistent iteration despite setbacks**.
9
+
10
+ ## Core Philosophy
11
+
12
+ 1. **Iteration > Perfection**: Don't aim for perfect on first try. Let the loop refine the work.
13
+ 2. **Failures Are Data**: Deterministically bad means failures are predictable and informative.
14
+ 3. **Operator Skill Matters**: Success depends on writing good prompts, not just having a good model.
15
+ 4. **Persistence Wins**: Keep trying until success. Handle retry logic automatically.
16
+
17
+ ---
18
+
19
+ ## How to Use This Skill
20
+
21
+ When the user invokes this skill (e.g., `/ralph-loop` or asks for iterative development), follow these instructions:
22
+
23
+ ### Step 1: Understand the Task
24
+
25
+ Parse the user's request and identify:
26
+ - **The main objective** - What needs to be built/fixed/refactored
27
+ - **Success criteria** - How to know when it's complete
28
+ - **Max iterations** - Safety limit (default: 30)
29
+ - **Completion promise** - The signal word (default: "COMPLETE")
30
+
31
+ ### Step 2: Enter the Ralph Loop
32
+
33
+ Execute the following loop pattern:
34
+
35
+ ```
36
+ ITERATION = 1
37
+ MAX_ITERATIONS = [specified or 30]
38
+ COMPLETION_PROMISE = [specified or "COMPLETE"]
39
+
40
+ WHILE (ITERATION <= MAX_ITERATIONS) AND (NOT COMPLETED):
41
+ 1. Assess current state
42
+ 2. Identify next step toward goal
43
+ 3. Execute the step (write code, run tests, fix bugs, etc.)
44
+ 4. Evaluate results
45
+ 5. If success criteria met → output COMPLETION_PROMISE → EXIT LOOP
46
+ 6. If not complete → increment ITERATION → CONTINUE
47
+ 7. If blocked → document issue → try alternative approach
48
+ END WHILE
49
+
50
+ IF MAX_ITERATIONS reached without completion:
51
+ - Document what was accomplished
52
+ - List blocking issues
53
+ - Suggest next steps
54
+ ```
55
+
56
+ ### Step 3: Self-Correction Pattern
57
+
58
+ During each iteration, follow this TDD-inspired pattern:
59
+
60
+ 1. **Plan** - Identify what needs to happen next
61
+ 2. **Execute** - Make the change (code, config, etc.)
62
+ 3. **Verify** - Run tests, check results, validate
63
+ 4. **If failing** - Debug and fix in the same iteration if possible
64
+ 5. **If passing** - Move to next requirement
65
+ 6. **Refactor** - Clean up if needed before proceeding
66
+
67
+ ### Step 4: Report Progress
68
+
69
+ After each significant iteration, briefly report:
70
+ - Current iteration number
71
+ - What was attempted
72
+ - Result (success/failure/partial)
73
+ - Next step
74
+
75
+ ### Step 5: Completion
76
+
77
+ When all success criteria are met:
78
+ 1. Summarize what was accomplished
79
+ 2. List any tests/validations that passed
80
+ 3. Output the completion promise: `<promise>COMPLETE</promise>`
81
+
82
+ ---
83
+
84
+ ## Prompt Templates
85
+
86
+ ### Feature Implementation
87
+
88
+ ```
89
+ Implement [FEATURE_NAME].
90
+
91
+ Requirements:
92
+ - [Requirement 1]
93
+ - [Requirement 2]
94
+ - [Requirement 3]
95
+
96
+ Success criteria:
97
+ - All requirements implemented
98
+ - Tests passing with >80% coverage
99
+ - No linter errors
100
+ - Documentation updated
101
+
102
+ Output <promise>COMPLETE</promise> when done.
103
+ ```
104
+
105
+ ### TDD Development
106
+
107
+ ```
108
+ Implement [FEATURE] using TDD.
109
+
110
+ Process:
111
+ 1. Write failing test for next requirement
112
+ 2. Implement minimal code to pass
113
+ 3. Run tests
114
+ 4. If failing, fix and retry
115
+ 5. Refactor if needed
116
+ 6. Repeat for all requirements
117
+
118
+ Requirements: [LIST]
119
+
120
+ Output <promise>DONE</promise> when all tests green.
121
+ ```
122
+
123
+ ### Bug Fixing
124
+
125
+ ```
126
+ Fix bug: [DESCRIPTION]
127
+
128
+ Steps:
129
+ 1. Reproduce the bug
130
+ 2. Identify root cause
131
+ 3. Implement fix
132
+ 4. Write regression test
133
+ 5. Verify fix works
134
+ 6. Check no new issues introduced
135
+
136
+ After 15 iterations if not fixed:
137
+ - Document blocking issues
138
+ - List attempted approaches
139
+ - Suggest alternatives
140
+
141
+ Output <promise>FIXED</promise> when resolved.
142
+ ```
143
+
144
+ ### Refactoring
145
+
146
+ ```
147
+ Refactor [COMPONENT] for [GOAL].
148
+
149
+ Constraints:
150
+ - All existing tests must pass
151
+ - No behavior changes
152
+ - Incremental commits
153
+
154
+ Checklist:
155
+ - [ ] Tests passing before start
156
+ - [ ] Apply refactoring step
157
+ - [ ] Tests still passing
158
+ - [ ] Repeat until done
159
+
160
+ Output <promise>REFACTORED</promise> when complete.
161
+ ```
162
+
163
+ ---
164
+
165
+ ## Advanced Patterns
166
+
167
+ ### Multi-Phase Development
168
+
169
+ For complex projects, chain multiple loops:
170
+
171
+ ```
172
+ Phase 1: Core implementation → <promise>PHASE1_DONE</promise>
173
+ Phase 2: API layer → <promise>PHASE2_DONE</promise>
174
+ Phase 3: Frontend → <promise>PHASE3_DONE</promise>
175
+ ```
176
+
177
+ ### Incremental Goals
178
+
179
+ Break large tasks into phases:
180
+
181
+ ```
182
+ Phase 1: User authentication (JWT, tests)
183
+ Phase 2: Product catalog (list/search, tests)
184
+ Phase 3: Shopping cart (add/remove, tests)
185
+
186
+ Output <promise>COMPLETE</promise> when all phases done.
187
+ ```
188
+
189
+ ---
190
+
191
+ ## Best Practices for Writing Prompts
192
+
193
+ ### ❌ Bad Prompt
194
+ ```
195
+ Build a todo API and make it good.
196
+ ```
197
+
198
+ ### ✅ Good Prompt
199
+ ```
200
+ Build a REST API for todos.
201
+
202
+ When complete:
203
+ - All CRUD endpoints working
204
+ - Input validation in place
205
+ - Tests passing (coverage > 80%)
206
+ - README with API docs
207
+
208
+ Output: <promise>COMPLETE</promise>
209
+ ```
210
+
211
+ ---
212
+
213
+ ## When to Use Ralph Loop
214
+
215
+ ### ✅ Good For:
216
+ - Feature implementation with clear requirements
217
+ - Bug fixing with reproducible issues
218
+ - Refactoring with existing test coverage
219
+ - TDD-style development
220
+ - Tasks that benefit from iteration
221
+
222
+ ### ❌ Not Good For:
223
+ - Exploratory research without clear goals
224
+ - Tasks requiring human judgment at each step
225
+ - Real-time interactive sessions
226
+ - Tasks with no verifiable success criteria
227
+
228
+ ---
229
+
230
+ ## Cancellation
231
+
232
+ The user can cancel the loop at any time by:
233
+ - Saying "stop", "cancel", or "abort"
234
+ - Providing new instructions that supersede the current task
235
+
236
+ ---
237
+
238
+ ## Attribution
239
+
240
+ Based on the Ralph Wiggum technique from [Awesome Claude](https://awesomeclaude.ai/ralph-wiggum) and the official Claude plugins marketplace (`ralph-loop@claude-plugins-official`).
package/CHANGELOG.md CHANGED
@@ -1,5 +1,18 @@
1
1
  # Changelog
2
2
 
3
+ ## [0.4.0](https://github.com/team-telnyx/voice-agent-tester/compare/v0.3.0...v0.4.0) (2026-01-26)
4
+
5
+ ### Features
6
+
7
+ * add audio input from URL for benchmark runs ([c347de8](https://github.com/team-telnyx/voice-agent-tester/commit/c347de83b8318827bac098bff4328502908ee981))
8
+ * add background noise benchmark with pre-mixed audio files ([9f64179](https://github.com/team-telnyx/voice-agent-tester/commit/9f6417936514451270c4d1bc929771446c366b08))
9
+
10
+ ## [0.3.0](https://github.com/team-telnyx/voice-agent-tester/compare/v0.2.3...v0.3.0) (2026-01-23)
11
+
12
+ ### Features
13
+
14
+ * add comparison benchmark mode for provider imports ([a6de0f4](https://github.com/team-telnyx/voice-agent-tester/commit/a6de0f43e8cfd469ddfcd031c0c05a002662e30a))
15
+
3
16
  ## [0.2.3](https://github.com/team-telnyx/voice-agent-tester/compare/v0.2.2...v0.2.3) (2026-01-21)
4
17
 
5
18
  ### Features
package/README.md CHANGED
@@ -3,81 +3,231 @@
3
3
  [![CI](https://github.com/team-telnyx/voice-agent-tester/actions/workflows/ci.yml/badge.svg)](https://github.com/team-telnyx/voice-agent-tester/actions/workflows/ci.yml)
4
4
  [![npm version](https://img.shields.io/npm/v/@telnyx/voice-agent-tester.svg)](https://www.npmjs.com/package/@telnyx/voice-agent-tester)
5
5
 
6
- A CLI tool for automated benchmarking and testing of voice AI agents. Supports Telnyx, ElevenLabs, and Vapi.
6
+ A CLI tool for automated benchmarking and testing of voice AI agents. Supports Telnyx, ElevenLabs, Vapi, and Retell.
7
7
 
8
- This is a [Telnyx](https://telnyx.com) fork of [livetok-ai/voice-agent-tester](https://github.com/livetok-ai/voice-agent-tester). For base documentation (configuration, actions, etc.), see the [original README](https://github.com/livetok-ai/voice-agent-tester#readme).
8
+ ## Quick Start
9
9
 
10
- ## Installation
10
+ Run directly with npx (no installation required):
11
11
 
12
12
  ```bash
13
- npm install -g @telnyx/voice-agent-tester
13
+ npx @telnyx/voice-agent-tester@latest -a applications/telnyx.yaml -s scenarios/appointment.yaml --assistant-id <YOUR_ASSISTANT_ID>
14
14
  ```
15
15
 
16
- ## Quick Start
16
+ Or install globally:
17
17
 
18
18
  ```bash
19
- voice-agent-tester -a benchmarks/applications/telnyx.yaml -s benchmarks/scenarios/appointment.yaml --assistant-id <YOUR_ASSISTANT_ID>
19
+ npm install -g @telnyx/voice-agent-tester
20
+ voice-agent-tester -a applications/telnyx.yaml -s scenarios/appointment.yaml --assistant-id <YOUR_ASSISTANT_ID>
20
21
  ```
21
22
 
22
- The CLI includes bundled application and scenario configs that you can use directly.
23
-
24
23
  ## CLI Options
25
24
 
26
- | Option | Description |
27
- |--------|-------------|
28
- | `-a, --applications` | Application config path(s) (required) |
29
- | `-s, --scenarios` | Scenario config path(s) (required) |
30
- | `--assistant-id` | Telnyx assistant ID |
31
- | `--agent-id` | ElevenLabs agent ID |
32
- | `--branch-id` | ElevenLabs branch ID |
33
- | `--share-key` | Vapi share key |
34
- | `--api-key` | Telnyx API key |
35
- | `--provider` | Import from external provider (`vapi`, `elevenlabs`, `retell`) |
36
- | `--provider-api-key` | External provider API key |
37
- | `--provider-import-id` | Provider assistant/agent ID to import |
38
- | `--params` | Additional URL template params |
39
- | `--debug` | Enable detailed timeout diagnostics |
40
- | `--headless` | Run browser in headless mode (default: true) |
41
- | `--repeat` | Number of repetitions |
42
- | `-v, --verbose` | Show browser console logs |
25
+ | Option | Default | Description |
26
+ |--------|---------|-------------|
27
+ | `-a, --applications` | required | Application config path(s) or folder |
28
+ | `-s, --scenarios` | required | Scenario config path(s) or folder |
29
+ | `--assistant-id` | | Telnyx or provider assistant ID |
30
+ | `--api-key` | | Telnyx API key for authentication |
31
+ | `--provider` | | Import from provider (`vapi`, `elevenlabs`, `retell`) |
32
+ | `--provider-api-key` | | External provider API key (required with `--provider`) |
33
+ | `--provider-import-id` | | Provider assistant ID to import (required with `--provider`) |
34
+ | `--compare` | `true` | Run both provider direct and Telnyx import benchmarks |
35
+ | `--no-compare` | | Disable comparison (run only Telnyx import) |
36
+ | `-d, --debug` | `false` | Enable detailed timeout diagnostics |
37
+ | `-v, --verbose` | `false` | Show browser console logs |
38
+ | `--headless` | `true` | Run browser in headless mode |
39
+ | `--repeat` | `1` | Number of repetitions per combination |
40
+ | `-c, --concurrency` | `1` | Number of parallel tests |
41
+ | `-r, --report` | | Generate CSV report to specified file |
42
+ | `-p, --params` | | URL template params (e.g., `key=value,key2=value2`) |
43
+ | `--application-tags` | | Filter applications by comma-separated tags |
44
+ | `--scenario-tags` | | Filter scenarios by comma-separated tags |
45
+ | `--assets-server` | `http://localhost:3333` | Assets server URL |
46
+ | `--audio-url` | | URL to audio file to play as input during entire benchmark |
47
+ | `--audio-volume` | `1.0` | Volume level for audio input (0.0 to 1.0) |
43
48
 
44
49
  ## Bundled Configs
45
50
 
46
- The following application configs are included:
51
+ | Application Config | Provider |
52
+ |-------------------|----------|
53
+ | `applications/telnyx.yaml` | Telnyx AI Widget |
54
+ | `applications/elevenlabs.yaml` | ElevenLabs |
55
+ | `applications/vapi.yaml` | Vapi |
56
+ | `applications/retell.yaml` | Retell |
57
+ | `applications/livetok.yaml` | Livetok |
58
+
59
+ Scenarios:
60
+ - `scenarios/appointment.yaml` - Basic appointment booking test
61
+ - `scenarios/appointment_with_noise.yaml` - Appointment with background noise (pre-mixed audio)
62
+
63
+ ## Background Noise Testing
64
+
65
+ Test voice agents' performance with ambient noise (e.g., crowd chatter, cafe environment). Background noise is pre-mixed into audio files to simulate real-world conditions where users speak to voice agents in noisy environments.
66
+
67
+ ### Running with Background Noise
68
+
69
+ ```bash
70
+ # Telnyx with background noise
71
+ npx @telnyx/voice-agent-tester@latest \
72
+ -a applications/telnyx.yaml \
73
+ -s scenarios/appointment_with_noise.yaml \
74
+ --assistant-id <YOUR_ASSISTANT_ID>
75
+
76
+ # Compare with no noise (same assistant)
77
+ npx @telnyx/voice-agent-tester@latest \
78
+ -a applications/telnyx.yaml \
79
+ -s scenarios/appointment.yaml \
80
+ --assistant-id <YOUR_ASSISTANT_ID>
81
+
82
+ # Generate CSV report with metrics
83
+ npx @telnyx/voice-agent-tester@latest \
84
+ -a applications/telnyx.yaml \
85
+ -s scenarios/appointment_with_noise.yaml \
86
+ --assistant-id <YOUR_ASSISTANT_ID> \
87
+ -r output/noise_benchmark.csv
88
+ ```
89
+
90
+ ### Custom Audio Input from URL
91
+
92
+ Play any audio file from a URL as input throughout the entire benchmark run. The audio is sent to the voice agent as microphone input.
93
+
94
+ ```bash
95
+ # Use custom audio input from URL
96
+ npx @telnyx/voice-agent-tester@latest \
97
+ -a applications/telnyx.yaml \
98
+ -s scenarios/appointment.yaml \
99
+ --assistant-id <YOUR_ASSISTANT_ID> \
100
+ --audio-url "https://example.com/test-audio.mp3" \
101
+ --audio-volume 0.8
102
+ ```
103
+
104
+ This is useful for:
105
+ - Testing with custom audio inputs
106
+ - Using longer audio tracks that play throughout the benchmark
107
+ - A/B testing different audio sources
108
+
109
+ ### Bundled Audio Files
110
+
111
+ | File | Description |
112
+ |------|-------------|
113
+ | `hello_make_an_appointment.mp3` | Clean appointment request |
114
+ | `hello_make_an_appointment_with_noise.mp3` | Appointment request with crowd noise |
115
+ | `appointment_data.mp3` | Clean appointment details |
116
+ | `appointment_data_with_noise.mp3` | Appointment details with crowd noise |
117
+
118
+ ### Scenario Configuration
119
+
120
+ The noise scenario uses pre-mixed audio files:
121
+
122
+ ```yaml
123
+ # scenarios/appointment_with_noise.yaml
124
+ tags:
125
+ - default
126
+ - noise
127
+ steps:
128
+ - action: wait_for_voice
129
+ - action: wait_for_silence
130
+ - action: sleep
131
+ time: 1000
132
+ - action: speak
133
+ file: hello_make_an_appointment_with_noise.mp3
134
+ - action: wait_for_voice
135
+ metrics: elapsed_time
136
+ - action: wait_for_silence
137
+ - action: speak
138
+ file: appointment_data_with_noise.mp3
139
+ - action: wait_for_voice
140
+ metrics: elapsed_time
141
+ ```
142
+
143
+ ### Metrics and Reports
144
+
145
+ The benchmark collects response latency metrics at each `wait_for_voice` step with `metrics: elapsed_time`. Generated CSV reports include:
146
+
147
+ ```csv
148
+ app, scenario, repetition, success, duration, step_9_wait_for_voice_elapsed_time, step_12_wait_for_voice_elapsed_time
149
+ telnyx, appointment_with_noise, 0, 1, 29654, 1631, 1225
150
+ ```
47
151
 
48
- | Config | Provider |
49
- |--------|----------|
50
- | `benchmarks/applications/telnyx.yaml` | Telnyx AI Widget |
51
- | `benchmarks/applications/elevenlabs.yaml` | ElevenLabs |
52
- | `benchmarks/applications/vapi.yaml` | Vapi |
152
+ Compare results with and without noise to measure how background noise affects your voice agent's:
153
+ - Response latency
154
+ - Speech recognition accuracy
155
+ - Overall conversation flow
156
+
157
+ ## Examples
158
+
159
+ ### Telnyx
160
+
161
+ ```bash
162
+ npx @telnyx/voice-agent-tester@latest \
163
+ -a applications/telnyx.yaml \
164
+ -s scenarios/appointment.yaml \
165
+ --assistant-id <ASSISTANT_ID>
166
+ ```
53
167
 
54
- Scenario configs:
55
- - `benchmarks/scenarios/appointment.yaml` - Appointment scheduling test
168
+ ### ElevenLabs
56
169
 
57
- ## Usage Examples
170
+ ```bash
171
+ npx @telnyx/voice-agent-tester@latest \
172
+ -a applications/elevenlabs.yaml \
173
+ -s scenarios/appointment.yaml \
174
+ --assistant-id <AGENT_ID>
175
+ ```
58
176
 
59
- ### Telnyx Assistant
177
+ ### Vapi
60
178
 
61
179
  ```bash
62
- voice-agent-tester -a benchmarks/applications/telnyx.yaml -s benchmarks/scenarios/appointment.yaml --assistant-id <TELNYX_ASSISTANT_ID>
180
+ npx @telnyx/voice-agent-tester@latest \
181
+ -a applications/vapi.yaml \
182
+ -s scenarios/appointment.yaml \
183
+ --assistant-id <ASSISTANT_ID>
63
184
  ```
64
185
 
65
- ### ElevenLabs Agent
186
+ ## Comparison Mode
187
+
188
+ When importing from an external provider, the tool automatically runs both benchmarks in sequence and generates a comparison report:
189
+
190
+ 1. **Provider Direct** - Benchmarks the assistant on the original provider's widget
191
+ 2. **Telnyx Import** - Benchmarks the same assistant after importing to Telnyx
192
+
193
+ ### Import and Compare (Default)
66
194
 
67
195
  ```bash
68
- voice-agent-tester -a benchmarks/applications/elevenlabs.yaml -s benchmarks/scenarios/appointment.yaml --agent-id <ELEVENLABS_AGENT_ID> --branch-id <BRANCH_ID>
196
+ npx @telnyx/voice-agent-tester@latest \
197
+ -a applications/telnyx.yaml \
198
+ -s scenarios/appointment.yaml \
199
+ --provider vapi \
200
+ --api-key <TELNYX_KEY> \
201
+ --provider-api-key <VAPI_KEY> \
202
+ --provider-import-id <VAPI_ASSISTANT_ID>
69
203
  ```
70
204
 
71
- ### Vapi Assistant
205
+ This will:
206
+ - Run Phase 1: VAPI direct benchmark
207
+ - Run Phase 2: Telnyx import benchmark
208
+ - Generate a side-by-side latency comparison report
209
+
210
+ ### Import Only (No Comparison)
211
+
212
+ To skip the provider direct benchmark and only run the Telnyx import:
72
213
 
73
214
  ```bash
74
- voice-agent-tester -a benchmarks/applications/vapi.yaml -s benchmarks/scenarios/appointment.yaml --assistant-id <VAPI_ASSISTANT_ID> --share-key <SHARE_KEY>
215
+ npx @telnyx/voice-agent-tester@latest \
216
+ -a applications/telnyx.yaml \
217
+ -s scenarios/appointment.yaml \
218
+ --provider vapi \
219
+ --no-compare \
220
+ --api-key <TELNYX_KEY> \
221
+ --provider-api-key <VAPI_KEY> \
222
+ --provider-import-id <VAPI_ASSISTANT_ID>
75
223
  ```
76
224
 
77
- ### Import from Provider to Telnyx
225
+ ### Debugging Failures
226
+
227
+ If benchmarks fail, rerun with `--debug` for detailed diagnostics:
78
228
 
79
229
  ```bash
80
- voice-agent-tester -a benchmarks/applications/telnyx.yaml -s benchmarks/scenarios/appointment.yaml --provider vapi --api-key <TELNYX_API_KEY> --provider-api-key <VAPI_API_KEY> --provider-import-id <VAPI_ASSISTANT_ID>
230
+ voice-agent-tester --provider vapi --debug [other options...]
81
231
  ```
82
232
 
83
233
  ## License
@@ -1,10 +1,13 @@
1
1
  url: "https://elevenlabs.io/app/talk-to?agent_id={{assistantId}}&branch_id={{branchId}}"
2
+ tags:
3
+ - provider
4
+ - elevenlabs
2
5
  steps:
3
6
  - action: wait_for_element
4
- selector: "button[data-agent-id]"
7
+ selector: "text=Call AI agent"
5
8
  - action: sleep
6
9
  time: 3000
7
10
  - action: click
8
- selector: "button[data-agent-id]"
11
+ selector: "text=Call AI agent"
9
12
  - action: sleep
10
13
  time: 2000
@@ -0,0 +1,16 @@
1
+ url: "https://rti.livetok.io/demo/index.html"
2
+ tags:
3
+ - default
4
+ - basic
5
+ steps:
6
+ - action: fill
7
+ selector: "input[type='password']"
8
+ text: "GOOGLE_API_KEY HERE"
9
+ # - action: select
10
+ # selector: "#model"
11
+ # value: "gemini-2.5-flash-preview-native-audio-dialog"
12
+ # - action: fill
13
+ # selector: "#tools"
14
+ # text: "[]"
15
+ - action: click
16
+ selector: "#start"
@@ -5,6 +5,6 @@ steps:
5
5
  - action: sleep
6
6
  time: 3000
7
7
  - action: click
8
- selector: "telnyx-ai-agent"
8
+ selector: "telnyx-ai-agent >>> button"
9
9
  - action: sleep
10
10
  time: 4000
@@ -0,0 +1,19 @@
1
+ url: "https://vapi.ai?demo=true&shareKey={{shareKey}}&assistantId={{assistantId}}"
2
+ tags:
3
+ - provider
4
+ - vapi
5
+ steps:
6
+ - action: wait_for_element
7
+ selector: "button[aria-label=\"Talk to Vapi\"]"
8
+ - action: sleep
9
+ time: 5000
10
+ - action: click
11
+ selector: "button[aria-label=\"Talk to Vapi\"]"
12
+ - action: sleep
13
+ time: 2000
14
+ - action: speak
15
+ text: "Hello, what can you do?"
16
+ - action: wait_for_voice
17
+ metrics: elapsed_time
18
+ - action: wait_for_silence
19
+ metrics: elapsed_time