vent-hq 0.10.1 → 0.10.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/dist/index.mjs CHANGED
@@ -5901,13 +5901,13 @@ import * as fs5 from "node:fs/promises";
5901
5901
  import * as path3 from "node:path";
5902
5902
 
5903
5903
  // src/skills/claude-code.md
5904
- var claude_code_default = '---\nname: vent\ndescription: Voice agent calls \u2014 run calls against your voice agent, get pass/fail results with latency and latency metrics\nallowed-tools: Bash(npx vent-hq *)\n---\n\n# Vent \u2014 Voice Agent Calls\n\nCall voice agents from the terminal. Calls run in the cloud \u2014 results stream back.\n\n## When to Call (read this first)\n\n- After modifying voice agent code (system prompt, tools, handlers): ALWAYS run calls\n- After changing audio config: run calls\n- Before marking a task complete that touches agent behavior: run calls\n- Do NOT skip call verification \u2014 voice agents are non-deterministic, small changes can break flows\n\n## LiveKit Agent Setup (REQUIRED before running LiveKit tests)\n\nWhen the user asks to test, configure, or run anything with a LiveKit agent, you MUST install the Vent LiveKit helper in the agent project and add the instrumentation call to the agent entrypoint BEFORE running any Vent calls.\n- Node.js: `npm install @vent-hq/livekit`, then add `instrumentLiveKitAgent({ ctx, session })` to the agent entrypoint.\n- Python: `pip install vent-livekit`, then add `instrument_livekit_agent(ctx=ctx, session=session)` to the agent entrypoint.\nWithout this, Vent cannot capture metrics, tool calls, usage, or session reports from the LiveKit agent.\n\nBefore running a LiveKit test, verify the agent process is running (`ps aux | grep <agent_process>`). If it\'s not running, start it before attempting any calls.\n\nLiveKit worker restart caveat: After restarting a LiveKit agent process, wait at least 60 seconds before running a Vent call. If calls were attempted against stale workers (failed with "agent did not speak" or "no speech detected"), kill the agent, wait 60 seconds for ALL stale workers to deregister, then restart. Do not restart the agent multiple times in quick succession \u2014 each restart creates another stale worker registration that compounds the problem.\n\n## Setup (if no Vent access token)\n\nIf `~/.vent/credentials` does not exist and `VENT_ACCESS_TOKEN` is not set:\n\n```bash\nnpx vent-hq init\n```\n\nOne command. No email, no browser, no human interaction needed. You get 10 free runs instantly.\n\n## Commands\n\n| Command | Purpose |\n|---------|---------|\n| `npx vent-hq init` | First-time setup (creates account + installs skills) |\n| `npx vent-hq agent start -f .vent/suite.<adapter>.json` | Start one shared local agent session (required for `start_command`) |\n| `npx vent-hq agent stop <session-id>` | Close a shared local agent session |\n| `npx vent-hq run -f .vent/suite.<adapter>.json` | Run a call from suite file (auto-selects if only one call) |\n| `npx vent-hq run -f .vent/suite.<adapter>.json --verbose` | Include debug fields in the result JSON |\n| `npx vent-hq run -f .vent/suite.<adapter>.json --call <name>` | Run a specific named call |\n| `npx vent-hq stop <run-id>` | Cancel a queued or running call |\n| `npx vent-hq status <run-id>` | Check results of a previous run |\n| `npx vent-hq status <run-id> --verbose` | Re-print a run with debug fields included |\n\n## When To Use `--verbose`\n\nDefault output is enough for most work. It already includes:\n- transcript\n- latency\n- audio analysis\n- tool calls\n- summary cost / recording / transfers\n\nUse `--verbose` only when you need debugging detail that is not in the default result:\n- per-turn debug fields: timestamps, caller decision mode, silence pad, STT confidence, platform transcript\n- raw signal analysis: `debug.signal_quality`\n- harness timings: `debug.harness_overhead`\n- raw prosody payload and warnings\n- raw provider warnings\n- per-turn component latency arrays\n- raw observed tool-call timeline\n- provider-specific metadata in `debug.provider_metadata`\n\nTrigger `--verbose` when:\n- transcript accuracy looks wrong and you need to inspect `platform_transcript`\n- latency is bad and you need per-turn/component breakdowns\n- interruptions/barge-in behavior looks wrong\n- tool-call execution looks inconsistent or missing\n- the provider returned warnings/errors or you need provider-native artifacts\n\nSkip `--verbose` when:\n- you only need pass/fail, transcript, latency, tool calls, recording, or summary\n- you are doing quick iteration on prompt wording and the normal result already explains the failure\n\n## Normalization Contract\n\nVent always returns one normalized result shape on `stdout` across adapters. Treat these as the stable categories:\n- `transcript`\n- `latency`\n- `tool_calls`\n- `component_latency`\n- `call_metadata`\n- `warnings`\n- `audio_actions`\n- `emotion`\n\nSource-of-truth policy:\n- Vent computes transcript, latency, and audio-quality metrics itself.\n- Hosted adapters choose the best source per category, usually provider post-call data for tool calls, call metadata, transfers, provider transcripts, and recordings.\n- Realtime provider events are fallback or enrichment only when post-call data is missing, delayed, weaker for that category, or provider-specific.\n- `LiveKit` helper events are the provider-native path for rich in-agent observability.\n- `websocket`/custom agents are realtime-native but still map into the same normalized categories.\n- Keep adapter-specific details in `call_metadata.provider_metadata` or `debug.provider_metadata`, not in new top-level fields.\n\n\n## Critical Rules\n\n1. **Run all calls in parallel in ONE Bash command** \u2014 Claude Code cannot run multiple Bash tool calls concurrently (`npx` is not on the read-only allowlist). Instead, launch all calls in a **single** Bash tool call using `&` and `wait`:\n ```bash\n npx vent-hq run -f .vent/suite.bland.json --call book-fire-inspection & npx vent-hq run -f .vent/suite.bland.json --call cancel-inspection & wait\n ```\n Set `timeout: 300000` (5 min) on the Bash call. NEVER run calls as separate Bash tool calls \u2014 they will serialize.\n2. **If a call gets backgrounded** \u2014 Wait for it to complete before proceeding. Never end your response without the result.\n3. **This skill is self-contained** \u2014 The full config schema is below. Do NOT re-read this file.\n4. **Always analyze results** \u2014 The run command outputs complete JSON with full transcript, latency, and tool calls. Use `--verbose` only when the default result is not enough to explain the failure. Analyze this output directly.\n\n## Workflow\n\n### First time: create the call suite\n\n1. Read the voice agent\'s codebase \u2014 understand its system prompt, tools, intents, and domain.\n2. Read the **Full Config Schema** section below for all available fields.\n3. Create the suite file in `.vent/` using the naming convention: `.vent/suite.<adapter>.json` (e.g., `.vent/suite.vapi.json`, `.vent/suite.websocket.json`, `.vent/suite.retell.json`). This prevents confusion when multiple adapters are tested in the same project.\n - Name calls after specific flows (e.g., `"reschedule-appointment"`, not `"call-1"`)\n - Write `caller_prompt` as a realistic persona with a specific goal, based on the agent\'s domain\n - Set `max_turns` based on the flow complexity (simple FAQ: 4-6, booking: 8-12, complex: 12-20)\n\n### Multiple suite files\n\nIf `.vent/` contains more than one suite file, **always check which adapter each suite uses before running**. Read the `connection.adapter` field in each file. Never run a suite intended for a different adapter \u2014 results will be meaningless or fail. When reporting results, always state which suite file produced them (e.g., "Results from `.vent/suite.vapi.json`:").\n\n### Run calls\n\n1. If the suite uses `start_command`, start the shared local session first:\n ```bash\n npx vent-hq agent start -f .vent/suite.<adapter>.json\n ```\n\n2. Run calls:\n ```bash\n # suite with one call (auto-selects)\n npx vent-hq run -f .vent/suite.<adapter>.json\n\n # suite with multiple calls \u2014 pick one by name\n npx vent-hq run -f .vent/suite.<adapter>.json --call happy-path\n\n # local start_command \u2014 add --session\n npx vent-hq run -f .vent/suite.<adapter>.json --call happy-path --session <session-id>\n ```\n\n3. To run multiple calls from the same suite, **run them in parallel in one Bash command**:\n ```bash\n npx vent-hq run -f .vent/suite.vapi.json --call happy-path & npx vent-hq run -f .vent/suite.vapi.json --call edge-case & wait\n ```\n\n4. Analyze each result, identify failures, correlate with the codebase, and fix.\n\n5. **Compare with previous run** \u2014 Vent saves full result JSON to `.vent/runs/` after every run. Read the second-most-recent JSON in `.vent/runs/` and compare it against the current run. The saved file shape:\n\n ```jsonc\n {\n "run_id": "\u2026",\n "timestamp": "2026-04-21T\u2026Z",\n "git_sha": "\u2026",\n "summary": { "status": "completed", "calls_total": 2, "calls_passed": 2, "calls_failed": 0, "total_duration_ms": 12345, "total_cost_usd": 0.01 },\n "call_results": [\n { "name": "happy-path", "status": "pass", "duration_ms": 6123, "transcript": [...], "observed_tool_calls": [...], "metrics": { "latency_p50_ms": 420, "latency_p95_ms": 980 }, "cost_usd": 0.004 }\n ]\n }\n ```\n\n Compare at these paths:\n - Status flips: `call_results[i].status` pass\u2192fail\n - Latency: `call_results[i].metrics.latency_p50_ms` / `latency_p95_ms` increased >20%\n - Tool calls: `call_results[i].observed_tool_calls[].successful` count dropped\n - Cost: `summary.total_cost_usd` or `call_results[i].cost_usd` increased >30%\n - Transcripts: `call_results[i].transcript` diverged significantly\n\n Correlate with the code diff (`git diff` between the two runs\' `git_sha` values). If no previous run exists, skip \u2014 this is the baseline.\n\n### After modifying voice agent code\n\nRe-run the existing suite \u2014 no need to recreate it.\n\n## Connection\n\n- **BYO agent runtime**: your agent owns its own provider credentials. Use `start_command` for a local agent or `agent_url` for a hosted custom endpoint.\n- **Platform-direct runtime**: use adapter `vapi | retell | elevenlabs | bland | livekit`. This is the only mode where Vent itself needs provider credentials and saved platform connections apply.\n\n## WebSocket Protocol (BYO agents)\n\nWhen using `adapter: "websocket"`, Vent communicates with the agent over a single WebSocket connection:\n\n- **Binary frames** \u2192 PCM audio (16-bit mono, configurable sample rate)\n- **Text frames** \u2192 optional JSON events the agent can send for better test accuracy:\n\n| Event | Format | Purpose |\n|-------|--------|---------|\n| `speech-update` | `{"type":"speech-update","status":"started"\\|"stopped"}` | Enables platform-assisted turn detection (more accurate than VAD alone) |\n| `tool_call` | `{"type":"tool_call","name":"...","arguments":{...},"result":...,"successful":bool,"duration_ms":number}` | Reports tool calls for observability |\n| `vent:timing` | `{"type":"vent:timing","stt_ms":number,"llm_ms":number,"tts_ms":number}` | Reports component latency breakdown per turn |\n| `vent:session` | `{"type":"vent:session","platform":"custom","provider_call_id":"...","provider_session_id":"..."}` | Reports stable provider/session identifiers |\n| `vent:call-metadata` | `{"type":"vent:call-metadata","call_metadata":{...}}` | Reports post-call metadata such as cost, recordings, variables, and provider-specific artifacts |\n| `vent:transcript` | `{"type":"vent:transcript","role":"caller"\\|"agent","text":"...","turn_index":0}` | Reports platform/native transcript text for caller or agent |\n| `vent:transfer` | `{"type":"vent:transfer","destination":"...","status":"attempted"\\|"completed"}` | Reports transfer attempts and outcomes |\n| `vent:debug-url` | `{"type":"vent:debug-url","label":"log","url":"https://..."}` | Reports provider debug/deep-link URLs |\n| `vent:warning` | `{"type":"vent:warning","message":"...","code":"..."}` | Reports provider/runtime warnings worth preserving in run metadata |\n\nVent sends `{"type":"end-call"}` to the agent when the test is done.\n\nAll text frames are optional \u2014 audio-only agents work fine with VAD-based turn detection.\n\n## Full Config Schema\n\n- ALL calls MUST reference the agent\'s real context (system prompt, tools, knowledge base) from the codebase.\n\n<vent_run>\n{\n "connection": { ... },\n "calls": {\n "happy-path": { ... },\n "edge-case": { ... }\n }\n}\n</vent_run>\n\nOne suite file per platform/adapter. `connection` is declared once, `calls` is a named map of call specs. Each key becomes the call name. Run one call at a time with `--call <name>`.\n\n<config_connection>\n{\n "connection": {\n "adapter": "required -- websocket | livekit | vapi | retell | elevenlabs | bland",\n "start_command": "shell command to start agent (relay only, required for local)",\n "health_endpoint": "health check path after start_command (default: /health, relay only, required for local)",\n "agent_url": "hosted custom agent URL (wss:// or https://). Use for BYO hosted agents.",\n "agent_port": "local agent port (default: 3001, required for local)",\n "platform": "optional authoring convenience for platform-direct adapters only. The CLI resolves this locally, creates/updates a saved platform connection, and strips raw provider secrets before submit. Do not use for websocket start_command or agent_url runs."\n }\n}\n\n<credential_resolution>\nIMPORTANT: How to handle platform credentials (API keys, secrets, agent IDs):\n\nThere are two product modes:\n- `BYO agent runtime`: your agent owns its own provider credentials. This covers both `start_command` (local) and `agent_url` (hosted custom endpoint).\n- `Platform-direct runtime`: Vent talks to `vapi`, `retell`, `elevenlabs`, `bland`, or `livekit` directly. This is the only mode that uses saved platform connections.\n\n1. For `start_command` and `agent_url` runs, do NOT put Deepgram / ElevenLabs / OpenAI / other provider keys into Vent config unless the Vent adapter itself needs them. Those credentials belong to the user\'s local or hosted agent runtime.\n2. For platform-direct adapters (`vapi`, `retell`, `elevenlabs`, `bland`, `livekit`), the CLI auto-resolves credentials from `.env.local`, `.env`, and the current shell env. If those env vars already exist, you can omit credential fields from the config JSON entirely.\n3. If you include credential fields in the config, put the ACTUAL VALUE, NOT the env var name. WRONG: `"vapi_api_key": "VAPI_API_KEY"`. RIGHT: `"vapi_api_key": "sk-abc123..."` or omit the field.\n4. The CLI uses the resolved provider config to create or update a saved platform connection server-side, then submits only `platform_connection_id`. Users should not manually author `platform_connection_id`.\n5. To check whether credentials are already available, inspect `.env.local`, `.env`, and any relevant shell env visible to the CLI process.\n6. **IMPORTANT: `npx vent-hq` commands auto-load `.env` files \u2014 never use `source .env && export` before running them.** Only your own custom scripts (e.g. `npx tsx my-script.ts`) need manual env loading. To add a new credential, just append it to `.env` and the CLI picks it up automatically on the next run.\n\nAuto-resolved env vars per platform:\n| Platform | Config field | Env var (auto-resolved from `.env.local`, `.env`, or shell env) |\n|----------|-------------|-----------------------------------|\n| Vapi | vapi_api_key | VAPI_API_KEY |\n| Vapi | vapi_assistant_id | VAPI_ASSISTANT_ID or VAPI_AGENT_ID |\n| Bland | bland_api_key | BLAND_API_KEY |\n| Bland | bland_pathway_id | BLAND_PATHWAY_ID |\n| Bland | persona_id | BLAND_PERSONA_ID |\n| LiveKit | livekit_api_key | LIVEKIT_API_KEY |\n| LiveKit | livekit_api_secret | LIVEKIT_API_SECRET |\n| LiveKit | livekit_url | LIVEKIT_URL |\n| Retell | retell_api_key | RETELL_API_KEY |\n| Retell | retell_agent_id | RETELL_AGENT_ID |\n| ElevenLabs | elevenlabs_api_key | ELEVENLABS_API_KEY |\n| ElevenLabs | elevenlabs_agent_id | ELEVENLABS_AGENT_ID |\n\nThe CLI strips raw platform secrets before `/runs/submit`. Platform-direct runs go through a saved `platform_connection_id` automatically. BYO agent runs (`start_command` and `agent_url`) do not.\n</credential_resolution>\n\n<config_adapter_rules>\nWebSocket (local agent via relay):\n{\n "connection": {\n "adapter": "websocket",\n "start_command": "npm run start",\n "health_endpoint": "/health",\n "agent_port": 3001\n }\n}\n\nWebSocket (hosted custom agent):\n{\n "connection": {\n "adapter": "websocket",\n "agent_url": "https://my-agent.fly.dev"\n }\n}\n\nRetell:\n{\n "connection": {\n "adapter": "retell",\n "platform": { "provider": "retell" }\n }\n}\nCredentials auto-resolve from `.env.local`, `.env`, or shell env: RETELL_API_KEY, RETELL_AGENT_ID. Only add retell_api_key/retell_agent_id to the JSON if those env vars are not already available.\nmax_concurrency for Retell: Pay-as-you-go includes 20 concurrent calls, with more available on demand; Enterprise has no cap. Ask the user which plan they\'re on. If unknown, default to 20.\n\nBland:\n{\n "connection": {\n "adapter": "bland",\n "platform": { "provider": "bland" }\n }\n}\nCredentials auto-resolve from `.env.local`, `.env`, or shell env: BLAND_API_KEY, BLAND_PATHWAY_ID, BLAND_PERSONA_ID. Only add bland_api_key/bland_pathway_id/persona_id to the JSON if those env vars are not already available.\nmax_concurrency for Bland: Start=10, Build=50, Scale=100, Enterprise=unlimited. Ask the user which plan they\'re on. If unknown, default to 10.\nNote: All agent config (voice, model, tools, etc.) is set on the pathway itself, not in Vent config.\n\nVapi:\n{\n "connection": {\n "adapter": "vapi",\n "platform": { "provider": "vapi" }\n }\n}\nCredentials auto-resolve from `.env.local`, `.env`, or shell env: VAPI_API_KEY, VAPI_ASSISTANT_ID (or VAPI_AGENT_ID). Only add vapi_api_key/vapi_assistant_id to the JSON if those env vars are not already available.\nmax_concurrency for Vapi: every account includes 10 concurrent call slots by default; self-serve accounts can buy extra reserved lines, and Enterprise includes unlimited concurrency. Set this to the user\'s purchased limit. If unknown, default to 10.\nAll assistant config (voice, model, transcriber, interruption settings, etc.) is set on the Vapi assistant itself, not in Vent config.\n\nElevenLabs:\n{\n "connection": {\n "adapter": "elevenlabs",\n "platform": { "provider": "elevenlabs" }\n }\n}\nCredentials auto-resolve from `.env.local`, `.env`, or shell env: ELEVENLABS_API_KEY, ELEVENLABS_AGENT_ID. Only add elevenlabs_api_key/elevenlabs_agent_id to the JSON if those env vars are not already available.\nmax_concurrency for ElevenLabs: Free=4, Starter=6, Creator=10, Pro=20, Scale=30, Business=30. Burst pricing can temporarily allow up to 3x the base limit. Ask the user which plan they\'re on and whether burst is enabled. If unknown, default to 4.\n\nLiveKit:\n{\n "connection": {\n "adapter": "livekit",\n "platform": {\n "provider": "livekit",\n "livekit_agent_name": "my-agent",\n "max_concurrency": 5\n }\n }\n}\nCredentials auto-resolve from `.env.local`, `.env`, or shell env: LIVEKIT_API_KEY, LIVEKIT_API_SECRET, LIVEKIT_URL. Only add these to the JSON if those env vars are not already available.\nlivekit_agent_name is optional -- only needed if your LiveKit agent registered with an explicit dispatch name in the SDK, e.g. Python `@server.rtc_session(agent_name="\u2026")` or `WorkerOptions(agent_name="\u2026")`, Node.js `new ServerOptions({ agentName: "\u2026" })`. Omit for automatic dispatch.\nThe livekit adapter requires the LiveKit Agents SDK. It depends on Agents SDK signals (lk.agent.state, lk.transcription) for readiness detection, turn timing, and component latency. Custom LiveKit participants not using the Agents SDK should use the websocket adapter with a relay instead.\nmax_concurrency for LiveKit Cloud: Build=5, Ship=20, Scale=50 managed inference sessions. Agent session concurrency can be higher (Build=5, Ship=20, Scale up to 600), but managed inference is the usual gating limit for voice agents. Ask the user which tier they\'re on. If unknown, default to 5.\nKnow the provider/account concurrency limits and use them in planning, but Vent does not enforce provider caps at runtime. Hosted worker throughput is an infra setting: `WORKER_TOTAL_CONCURRENCY` caps one worker Machine.\n</config_adapter_rules>\n</config_connection>\n\n\n<call_config>\n<tool_call_capture>\nvapi/retell/elevenlabs/bland: automatic via platform API (no user code needed).\nWebSocket: user\'s agent must emit a JSON text frame per tool call: {"type":"tool_call","name":"...","arguments":{},"result":{},"successful":true,"duration_ms":150}\nLiveKit: use the `@vent-hq/livekit` (Node) or `vent-livekit` (Python) helper. See the "LiveKit Agent Setup" section. The helper captures tool calls automatically from Agents SDK session events \u2014 do not publish on Vent topics manually.\n</tool_call_capture>\n\n<component_timing>\nPlatform adapters (vapi/retell/elevenlabs/bland/livekit) get STT/LLM/TTS breakdown automatically.\nWebSocket agents can opt in by sending a JSON text frame after each agent turn:\n {"type":"vent:timing","stt_ms":120,"llm_ms":450,"tts_ms":80}\nAll fields optional. Send one per agent response. Without this, component_latency is omitted from results.\nWhen modifying a WebSocket agent\'s code, add this text frame after TTS completes to enable component latency reporting.\n</component_timing>\n\n<metadata_capture>\nWebSocket agents can emit richer observability metadata as JSON text frames:\n {"type":"vent:session","platform":"custom","provider_call_id":"call_123","provider_session_id":"session_abc"}\n {"type":"vent:call-metadata","call_metadata":{"recording_url":"https://...","cost_usd":0.12,"provider_debug_urls":{"log":"https://..."}}}\n {"type":"vent:debug-url","label":"trace","url":"https://..."}\n {"type":"vent:session-report","report":{"room_name":"room-123","events":[...],"metrics":[...]}}\n {"type":"vent:transcript","role":"caller","text":"I need to reschedule","turn_index":0}\n\n`vent:session-report` in the docs is not a blanket instruction for LiveKit agents. In LiveKit mode, only publish what the helper explicitly supports \u2014 hand-rolling a report from `ctx.addShutdownCallback` runs after `room.disconnect()` and fails with "engine is closed".\n\nLiveKit agents get all metadata through the `@vent-hq/livekit` (Node) / `vent-livekit` (Python) helper \u2014 it subscribes to Agents SDK session events (`metrics_collected`, `function_tools_executed`, `conversation_item_added`, `session_usage_updated`, close) and publishes on Vent topics automatically. Transcript and agent-state timing come from native LiveKit room signals (`lk.transcription`, `lk.agent.state`) \u2014 the helper does not duplicate them.\n\nNode.js \u2014 `npm install @vent-hq/livekit`:\n```ts\nimport { instrumentLiveKitAgent } from "@vent-hq/livekit";\n\nconst vent = instrumentLiveKitAgent({ ctx, session });\n```\nPython \u2014 `pip install vent-livekit`:\n```python\nfrom vent_livekit import instrument_livekit_agent\n\nvent = instrument_livekit_agent(ctx=ctx, session=session)\n```\n\nThe helper is the only supported integration path for LiveKit Agents SDK agents. Do not publish on `vent:*` topics manually \u2014 let the helper forward SDK events.\n</metadata_capture>\n\n<config_call>\nEach call in the `calls` map. The key is the call name (e.g. `"reschedule-appointment"`, not `"call-1"`).\n{\n "caller_prompt": "required \u2014 caller persona and behavior (name -> goal -> emotion -> conditional behavior)",\n "max_turns": "required \u2014 default 6",\n "silence_threshold_ms": "optional \u2014 end-of-turn threshold ms (default 800, 200-10000). 800-1200 FAQ, 2000-3000 tool calls, 3000-5000 complex reasoning.",\n "persona": "optional \u2014 caller behavior controls",\n {\n "pace": "slow | normal | fast",\n "clarity": "clear | vague | rambling",\n "disfluencies": "true | false",\n "cooperation": "cooperative | reluctant | hostile",\n "emotion": "neutral | cheerful | confused | frustrated | skeptical | rushed",\n "interruption_style": "optional preplanned interrupt tendency: low | high. If set, Vent may pre-plan a caller cut-in before the agent turn starts. It does NOT make a mid-turn interrupt LLM call.",\n "memory": "reliable | unreliable",\n "intent_clarity": "clear | indirect | vague",\n "confirmation_style": "explicit | vague"\n },\n "audio_actions": "optional \u2014 per-turn audio stress calls",\n [\n { "action": "interrupt", "at_turn": "N", "prompt": "what caller says" },\n { "action": "inject_noise", "at_turn": "N", "noise_type": "babble | white | pink", "snr_db": "0-40" },\n { "action": "split_sentence", "at_turn": "N", "split": { "part_a": "...", "part_b": "...", "pause_ms": "500-5000" } },\n { "action": "noise_on_caller", "at_turn": "N" }\n ],\n "prosody": "optional \u2014 Hume emotion analysis (default false)",\n "caller_audio": "optional \u2014 omit for clean audio",\n {\n "noise": { "type": "babble | white | pink", "snr_db": "0-40" },\n "speed": "0.5-2.0 (1.0 = normal)",\n "speakerphone": "true | false",\n "mic_distance": "close | normal | far",\n "clarity": "0.0-1.0 (1.0 = perfect)",\n "accent": "american | british | australian | filipino | spanish_mexican | spanish_peninsular | spanish_colombian | spanish_argentine | german | french | italian | dutch | japanese",\n "packet_loss": "0.0-0.3",\n "jitter_ms": "0-100"\n },\n "language": "optional \u2014 ISO 639-1: en, es, fr, de, it, nl, ja"\n}\n\nInterruption rules:\n- `audio_actions: [{ "action": "interrupt", ... }]` is the deterministic per-turn interrupt test. Prefer this for evaluation.\n- `persona.interruption_style` is only a preplanned caller tendency. If used, Vent decides before the agent response starts whether this turn may cut in.\n- Vent no longer pauses mid-turn to ask a second LLM whether to interrupt.\n- For production-faithful testing, prefer explicit `audio_actions.interrupt` over persona interruption.\n\n<examples_call>\n<simple_suite_example>\n{\n "connection": {\n "adapter": "vapi",\n "platform": { "provider": "vapi" }\n },\n "calls": {\n "reschedule-appointment": {\n "caller_prompt": "You are Maria, calling to reschedule her dentist appointment from Thursday to next Tuesday. She\'s in a hurry and wants this done quickly.",\n "max_turns": 8\n },\n "cancel-appointment": {\n "caller_prompt": "You are Tom, calling to cancel his appointment for Friday. He\'s calm and just wants confirmation.",\n "max_turns": 6\n }\n }\n}\n</simple_suite_example>\n\n<advanced_call_example>\nA call entry with advanced options (persona, audio actions, prosody):\n{\n "noisy-interruption-booking": {\n "caller_prompt": "You are James, an impatient customer calling from a loud coffee shop to book a plumber for tomorrow morning. You interrupt the agent mid-sentence when they start listing availability \u2014 you just want the earliest slot.",\n "max_turns": 12,\n "persona": { "pace": "fast", "cooperation": "reluctant", "emotion": "rushed", "interruption_style": "high" },\n "audio_actions": [\n { "action": "interrupt", "at_turn": 3, "prompt": "Just give me the earliest one!" },\n { "action": "inject_noise", "at_turn": 1, "noise_type": "babble", "snr_db": 15 }\n ],\n "caller_audio": { "noise": { "type": "babble", "snr_db": 20 }, "speed": 1.3 },\n "prosody": true\n }\n}\n</advanced_call_example>\n\n</examples_call>\n</config_call>\n\n<output_conversation_test>\n{\n "name": "sarah-hotel-booking",\n "status": "completed",\n "caller_prompt": "You are Sarah, calling to book...",\n "duration_ms": 45200,\n "error": null,\n "transcript": [\n { "role": "caller", "text": "Hi, I\'d like to book..." },\n { "role": "agent", "text": "Sure! What date?", "ttfb_ms": 650, "ttfw_ms": 780, "audio_duration_ms": 2400 },\n { "role": "agent", "text": "Let me check availability.", "ttfb_ms": 540, "ttfw_ms": 620, "audio_duration_ms": 1400 },\n { "role": "caller", "text": "Just the earliest slot please", "audio_duration_ms": 900 },\n { "role": "agent", "text": "Sure, the earliest is 9 AM tomorrow.", "ttfb_ms": 220, "ttfw_ms": 260, "audio_duration_ms": 2100 }\n ],\n "latency": {\n "response_time_ms": 890, "response_time_source": "ttfw",\n "p50_response_time_ms": 850, "p90_response_time_ms": 1100, "p95_response_time_ms": 1400, "p99_response_time_ms": 1550,\n "first_response_time_ms": 1950,\n "mean_ttfw_ms": 890, "p50_ttfw_ms": 850, "p95_ttfw_ms": 1400, "p99_ttfw_ms": 1550,\n "first_turn_ttfw_ms": 1950,\n "drift_slope_ms_per_turn": -45.2, "mean_silence_pad_ms": 128, "mouth_to_ear_est_ms": 1020\n },\n "tool_calls": {\n "total": 2, "successful": 2, "failed": 0, "mean_latency_ms": 340,\n "names": ["check_availability", "book_appointment"],\n "observed": [{ "name": "check_availability", "arguments": { "date": "2026-03-12" }, "result": { "slots": ["09:00", "10:00"] }, "successful": true, "latency_ms": 280, "turn_index": 3 }]\n },\n "component_latency": {\n "mean_stt_ms": 120, "mean_llm_ms": 450, "mean_tts_ms": 80,\n "p95_stt_ms": 180, "p95_llm_ms": 620, "p95_tts_ms": 110,\n "mean_speech_duration_ms": 2100,\n "bottleneck": "llm"\n },\n "call_metadata": {\n "platform": "vapi",\n "cost_usd": 0.08,\n "recording_url": "https://example.com/recording",\n "ended_reason": "customer_ended_call",\n "transfers": []\n },\n "warnings": [],\n "audio_actions": [],\n "emotion": {\n "naturalness": 0.72, "mean_calmness": 0.65, "mean_confidence": 0.58, "peak_frustration": 0.08, "emotion_trajectory": "stable"\n }\n}\n\nAlways present: name, status, caller_prompt, duration_ms, error, transcript, tool_calls, warnings, audio_actions. Nullable when analysis didn\'t run: latency, component_latency, call_metadata, emotion (requires prosody: true), debug (requires --verbose).\n\n### Result presentation\n\nWhen you report a conversation result to the user, always include:\n\n1. **Summary** \u2014 the overall verdict and the 1-3 most important findings.\n2. **Transcript summary** \u2014 a short narrative of what happened in the call.\n3. **Recording URL** \u2014 include `call_metadata.recording_url` when present; explicitly say when it is unavailable.\n4. **Next steps** \u2014 concrete fixes, follow-up tests, or why no change is needed.\n\nUse metrics to support the summary, not as the whole answer. Do not dump raw numbers without interpretation.\n\nWhen `call_metadata.transfer_attempted` is present, explicitly say whether the transfer only appeared attempted or was mechanically verified as completed (`call_metadata.transfer_completed`). Use `call_metadata.transfers[]` to report transfer type, destination, status, and sources.\n\n### Judging guidance\n\nUse the transcript, metrics, test scenario, and relevant agent instructions/system prompt to judge:\n\n| Dimension | What to check |\n|--------|----------------|\n| **Hallucination detection** | Check whether the agent stated anything not grounded in its instructions, tools, or the conversation itself. |\n| **Instruction following** | Compare the agent\'s behavior against its system prompt and the test\'s expected constraints. |\n| **Context retention** | Check whether the agent forgot or contradicted information established earlier in the call. |\n| **Semantic accuracy** | Check whether the agent correctly understood the caller\'s intent and responded to the real request. |\n| **Goal completion** | Decide whether the agent achieved what the test scenario was designed to verify. |\n| **Transfer correctness** | For transfer scenarios, judge whether transfer was appropriate, whether it completed, whether it went to the expected destination, and whether enough context was passed during the handoff. |\n\nIgnore minor STT mis-transcriptions in `transcript` text (e.g. `"check teach hat"` for `"check that"`, swapped homophones, missing question marks on short tails). These are streaming-STT artifacts, not agent bugs. Judge on semantic intent, not exact spelling. Only flag transcript quality when it prevents understanding what the agent actually said.\n\n### Interruption evaluation\n\nEvaluate interruption handling by reading the transcript and listening to the recording. Flag any turn where the agent ignores a barge-in, repeats itself from scratch, or loses context after being cut off.\n\n| Dimension | How to evaluate |\n|--------|----------------|\n| **Recovery** | After a caller cuts in, does the agent\'s next reply acknowledge or address the barge-in rather than restarting from scratch? |\n| **Context retention** | After the interruption, does the agent remember pre-interrupt conversation state (caller name, booking details, earlier answers)? |\n| **Overtalk** | Does the agent keep speaking for long after the caller starts, or does it yield promptly? Use the recording to judge. |\n</output_conversation_test>\n</call_config>\n\n\n## Output\n\n- **Exit codes**: 0=pass, 1=fail, 2=error\n- The `run` command outputs **complete results as pretty-printed JSON** \u2014 including full transcript, latency metrics, tool calls, and audio analysis for every call. Do NOT run a separate `vent status` command \u2014 all data is already in the output.\n\n## Vent Access Token\n\nRun `npx vent-hq login` or set `VENT_ACCESS_TOKEN` env var.\nVent provides DEEPGRAM_API_KEY and ANTHROPIC_API_KEY automatically.\n';
5904
+ var claude_code_default = '---\nname: vent\ndescription: Place real calls against the user\'s voice agent to verify recent code changes worked end-to-end. Trigger on test requests or after any non-trivial voice-agent change.\nallowed-tools: Bash(npx vent-hq *)\n---\n\n# Vent - Voice Agent Calls\n\nUse Vent to verify real voice-agent behavior from the terminal.\n\n## When to Test\n\nVent is the voice CI gate for the agent. A non-trivial change that touches voice behavior is not "done" until the relevant Vent call has been run and you\'ve judged from its output that the change works. Auto-trigger a call after finishing the change \u2014 do not wait to be asked.\n\nNon-trivial means anything that can change what the caller hears, how the agent responds, what tools are called, how the call routes, or how audio timing behaves: prompt rewrites or sections (not a one-word tweak), tool schema or handler changes, state/memory logic, transfer logic, interruption handling, provider/audio config, platform integration code, and any bug fix for voice-agent behavior. One-line copy or comment edits do not need a call.\n\nBefore launching, check `.vent/runs/` for unresolved issues from prior runs (failing status, regressions, warnings, latency jumps). If any are still present and within the scope of the current task, fix them as part of this change, then run the call to verify both the new fix and the prior issues are resolved. Report unresolved prior issues that are out of scope rather than silently leaving them.\n\nDo not declare the task complete until you\'ve reviewed the call results \u2014 transcript, tool calls, observed behavior \u2014 and confirmed the change actually does what was intended. Vent\'s `status` and exit code only tell you whether the call ran end-to-end without a pipeline error; mission success is your judgment.\n\n## Autonomous Iteration\n\nYou can run a fix \u2192 Vent-call \u2192 fix \u2192 Vent-call loop within a single user request, doing many iterations before returning. This is genuinely useful for mechanical issues where the fix is high-confidence but the verification is slow.\n\nDefault: don\'t iterate autonomously. Run the relevant call after your change, surface findings, and let the user decide what\'s next \u2014 they stay in control of cost and scope.\n\nTrigger autonomous iteration when either:\n- The user explicitly asks for it ("iterate until it works", "keep going", "autonomous mode", "fix everything").\n- You judge it\'s clearly worth it: the change is well-scoped, the failure mode is mechanical (tool schema, registry, prompt phrasing), and you have a concrete plan for the next attempt. If you\'d be guessing at the next attempt, stop and ask.\n\nCap iterations at ~5 unless the user gave a higher bound. If the same fix attempt fails twice, or the failure mode keeps shifting between attempts, stop and report \u2014 you\'re thrashing.\n\nThe user may not know autonomous iteration is on the table. When you suspect a likely-multi-cycle issue, offer it once before starting solo (e.g. "I can iterate on this autonomously, otherwise I\'ll stop here for your review").\n\n## Claude Code Execution\n\nUse a 5-minute shell-tool timeout (`300000` ms) on Vent run commands so normal calls are not killed by the default 2-minute Bash timeout. This is not backgrounding; wait for stdout/results before ending your response. Use the JSON returned by `npx vent-hq run` directly; do not call `vent status` unless checking an older run.\n\n### Parallel Execution\n\nClaude Code serializes separate Bash tool calls for `npx vent-hq ...`, so run multiple calls from one suite by invoking each named call with `--call <name>` in one Bash command using `&` and a final `wait`.\n\n```bash\nnpx vent-hq run -f .vent/suite.vapi.json --call happy-path & \\\nnpx vent-hq run -f .vent/suite.vapi.json --call tool-path & \\\nwait\n```\n\n## Workflow\n\n1. Identify the behavior under test. Read enough of the agent codebase to understand its system prompt, tools, handlers, routes, provider config, platform wiring, and expected handoffs.\n2. Reuse an existing `.vent/suite.<adapter>.json` when possible. If `.vent/` contains multiple suites, inspect `connection.adapter` and report which suite file produced the result.\n3. Create or update a suite only when the existing calls do not cover the changed behavior. Name calls after real flows, for example `reschedule-appointment`, not `call-1`.\n4. If the suite uses `start_command`, start one shared local session first with `npx vent-hq agent start -f .vent/suite.<adapter>.json`, then pass `--session <session-id>` to each run.\n5. Pick which call(s) to run based on the change. Fixed bug: replay the failing scenario. Changed tool: include a call that triggers that tool. Prompt or routing change: include the relevant happy path and any important edge path.\n6. Compare against the previous JSON in `.vent/runs/` when validating a fix or regression. Check status flips, latency jumps, tool-call success drops, cost jumps, and transcript divergence. Correlate with `git diff` between saved `git_sha` values when available; skip if no previous run exists.\n\n## Saved Runs\n\nAfter every run, Vent writes the full result JSON to `.vent/runs/`. Shape:\n\n```jsonc\n{\n "run_id": "...",\n "timestamp": "2026-04-21T...Z",\n "git_sha": "...",\n "summary": { "status": "completed", "calls_total": 2, "calls_passed": 2, "calls_failed": 0, "total_duration_ms": 12345, "total_cost_usd": 0.01 },\n "call_results": [\n { "name": "happy-path", "status": "pass", "duration_ms": 6123, "transcript": [], "observed_tool_calls": [], "metrics": { "latency_p50_ms": 420, "latency_p95_ms": 980 }, "cost_usd": 0.004 }\n ]\n}\n```\n\nWhen comparing against a prior run (Workflow step 6), inspect these paths:\n\n- Run-completion status flips: `call_results[i].status` (this only reflects whether the call ran cleanly through the pipeline, not whether the agent accomplished the goal \u2014 judge that from the transcript)\n- Latency: `call_results[i].metrics.latency_p50_ms` or `latency_p95_ms` increased >20%\n- Tool calls: count of `call_results[i].observed_tool_calls[].successful` dropped\n- Cost: `summary.total_cost_usd` or `call_results[i].cost_usd` increased >30%\n- Transcript: `call_results[i].transcript` diverged in semantic content (ignore STT noise)\n\n## Commands\n\n```bash\nnpx vent-hq init # First-time setup: autonomous auth, access token generation, skill install, and minimal starter suite\nnpx vent-hq login # Log in to an existing account and save credentials\nnpx vent-hq run -f .vent/suite.<adapter>.json # Run the only call in a suite, or error if the suite has multiple calls\nnpx vent-hq run -f .vent/suite.<adapter>.json --call happy-path # Run one named call from a multi-call suite\nnpx vent-hq run -f .vent/suite.<adapter>.json --call happy-path --session <session-id> # Run one named call through an existing local relay session\nnpx vent-hq run -f .vent/suite.<adapter>.json --call happy-path --verbose # Run one named call with verbose debug fields\nnpx vent-hq stop <run-id> # Cancel a queued or running run\nnpx vent-hq status <run-id> # Fetch results for a previous run\nnpx vent-hq status <run-id> --verbose # Fetch previous run results with verbose debug fields\nnpx vent-hq agent start -f .vent/suite.<adapter>.json # Start a shared local relay session for suites that use start_command\nnpx vent-hq agent stop <session-id> # Stop a shared local relay session\n```\n\nIf `~/.vent/credentials` is missing and `VENT_ACCESS_TOKEN` is not set, run `npx vent-hq init`. For an existing account, run `npx vent-hq login` or set `VENT_ACCESS_TOKEN`.\n\n## Suite Config\n\nSuites live in `.vent/suite.<adapter>.json`. `connection` is declared once per suite. `calls` is a named map, and each key becomes the call name used with `--call`.\n\nLocal websocket suite:\n\n```json\n{\n "connection": {\n "adapter": "websocket",\n "start_command": "npm run start",\n "health_endpoint": "/health",\n "agent_port": 3001\n },\n "calls": {\n "happy-path": {\n "caller_prompt": "You are Maria calling to reschedule her appointment to next Tuesday.",\n "max_turns": 8,\n "silence_threshold_ms": 1200,\n "audio_actions": [\n { "action": "interrupt", "at_turn": 3, "prompt": "Just give me the earliest one." }\n ]\n }\n }\n}\n```\n\nPlatform-direct suite:\n\n```json\n{\n "connection": {\n "adapter": "vapi",\n "platform": { "provider": "vapi" }\n },\n "calls": {\n "happy-path": {\n "caller_prompt": "You are Maria calling to reschedule her appointment to next Tuesday.",\n "max_turns": 8\n }\n }\n}\n```\n\nWrite `caller_prompt` as a realistic caller with a name, goal, mood, constraints, and conditional behavior. Set `max_turns` based on flow complexity: FAQ `4-6`, booking or tool use `8-12`, complex flows `12-20`.\n\nCall fields:\n\n- `caller_prompt` and `max_turns` are required.\n- `silence_threshold_ms` must be `200-10000`. Common ranges: FAQ `800-1200`, tool calls `2000-3000`, complex reasoning `3000-5000`.\n- `persona` supports `pace`, `clarity`, `disfluencies`, `cooperation`, `emotion`, `interruption_style`, `memory`, `intent_clarity`, and `confirmation_style`.\n- `audio_actions` supports `interrupt`, `inject_noise`, `split_sentence`, and `noise_on_caller`.\n- `caller_audio` supports noise, speed, speakerphone, mic distance, clarity, accent, packet loss, and jitter.\n- `language` is an ISO 639-1 code such as `en`, `es`, `fr`, `de`, `it`, `nl`, or `ja`.\n- `prosody: true` enables emotion analysis and requires Hume access.\n- Prefer explicit `audio_actions.interrupt` over `persona.interruption_style` for deterministic barge-in tests. `persona.interruption_style` is only a preplanned caller tendency.\n\n## Connections and Credentials\n\n### Adapter choice\n\nUse `websocket` for your own local or hosted runtime. Use `start_command` for local agents or `agent_url` for hosted custom endpoints. For `start_command` and `agent_url`, do not put Deepgram, ElevenLabs, OpenAI, or other agent runtime keys into Vent config unless the Vent adapter itself needs them \u2014 the tested agent owns its own runtime credentials.\n\nUse `vapi`, `retell`, `elevenlabs`, `bland`, or `livekit` for platform-direct testing. In this mode Vent itself talks to the provider on the user\'s behalf.\n\nVent provides `DEEPGRAM_API_KEY` and `ANTHROPIC_API_KEY` for its hosted caller/evaluation stack \u2014 those are Vent\'s, not the tested agent\'s.\n\n### Credential resolution\n\nIn platform-direct mode the CLI auto-resolves credentials from `.env.local`, `.env`, and the current shell environment. Do not run `source .env && export` before Vent commands. If you include credential fields in JSON, use the actual value, not the env var name. Do not manually author `platform_connection_id`; the CLI creates or updates the saved platform connection automatically.\n\nAuto-resolved env vars and JSON fields:\n\n- Vapi: `VAPI_API_KEY` -> `vapi_api_key`; `VAPI_ASSISTANT_ID` or `VAPI_AGENT_ID` -> `vapi_assistant_id`\n- Bland: `BLAND_API_KEY` -> `bland_api_key`; `BLAND_PATHWAY_ID` -> `bland_pathway_id`; `BLAND_PERSONA_ID` -> `persona_id`\n- LiveKit: `LIVEKIT_API_KEY` -> `livekit_api_key`; `LIVEKIT_API_SECRET` -> `livekit_api_secret`; `LIVEKIT_URL` -> `livekit_url`\n- Retell: `RETELL_API_KEY` -> `retell_api_key`; `RETELL_AGENT_ID` -> `retell_agent_id`\n- ElevenLabs: `ELEVENLABS_API_KEY` -> `elevenlabs_api_key`; `ELEVENLABS_AGENT_ID` -> `elevenlabs_agent_id`\n\n### Provider config\n\nUse existing provider config when possible: Vapi assistant, Retell agent, ElevenLabs agent, Bland pathway, or LiveKit agent. Bland uniquely supports inline config \u2014 `platform` may use `bland_pathway_id`, `persona_id`, or an inline `task` (with optional voice, model, and turn-handling overrides; see Bland\'s API docs for the full field list).\n\n### Concurrency\n\nWhen you fan out multiple Vent calls in parallel against the same provider (for example, running several named calls from one suite at once with `&` and `wait`), respect the provider\'s per-account concurrency limit. Exceeding it makes calls queue or fail at the provider \u2014 Vent does not enforce these caps for you.\n\nRecord the limit as `max_concurrency` in the suite\'s `platform` block so it\'s visible on future runs. Ask the user which plan they\'re on if sizing matters; otherwise use the conservative default in bold.\n\n- **Vapi**: **10** included per account; reserved lines can be purchased self-serve; Enterprise is unlimited.\n- **Retell**: Pay-as-you-go includes **20**; Enterprise has no cap.\n- **Bland**: Start=**10**, Build=50, Scale=100, Enterprise=unlimited.\n- **ElevenLabs**: Free=**4**, Starter=6, Creator=10, Pro=20, Scale=30, Business=30. Burst pricing can temporarily allow up to 3x base.\n- **LiveKit Cloud**: Build=**5**, Ship=20, Scale=50 managed inference sessions (the usual gate for voice agents); agent-session concurrency can go higher (Scale up to 600).\n\n## WebSocket\n\nFor `adapter: "websocket"`, Vent sends binary 16-bit mono PCM audio over one websocket connection. Websocket text frames are optional JSON events. Audio-only websocket agents still work, but events improve turn detection and observability. Vent sends `{"type":"end-call"}` when the test is done.\n\nUseful websocket text frames:\n\n```jsonc\n{"type":"speech-update","status":"started"}\n{"type":"speech-update","status":"stopped"}\n{"type":"tool_call","name":"check_availability","arguments":{},"result":{},"successful":true,"duration_ms":150}\n{"type":"vent:timing","stt_ms":120,"llm_ms":450,"tts_ms":80}\n{"type":"vent:session","platform":"custom","provider_call_id":"call_123","provider_session_id":"session_abc"}\n{"type":"vent:call-metadata","call_metadata":{"recording_url":"https://...","cost_usd":0.12}}\n{"type":"vent:transcript","role":"caller","text":"I need to reschedule","turn_index":0}\n{"type":"vent:transfer","destination":"+15551234567","status":"attempted"}\n{"type":"vent:debug-url","label":"trace","url":"https://..."}\n{"type":"vent:warning","message":"provider warning","code":"provider_warning"}\n```\n\n`vent:session-report` is **not** handled by the websocket adapter \u2014 it\'s only consumed by the LiveKit helper. Do not emit it from a websocket agent.\n\nPlatform adapters capture tool calls automatically. Websocket agents must emit `tool_call` frames for tool observability. Platform adapters get component latency automatically. Websocket agents should emit `vent:timing` after each agent response when STT/LLM/TTS breakdown is available.\n\n## LiveKit\n\nBefore running LiveKit tests, install and add the Vent helper to the LiveKit agent entrypoint. Node: `npm install @vent-hq/livekit`, then call `instrumentLiveKitAgent({ ctx, session })`. Python: `pip install vent-livekit`, then call `instrument_livekit_agent(ctx=ctx, session=session)`.\n\nLiveKit direct mode requires the LiveKit Agents SDK. Custom LiveKit participants should use the websocket adapter with a relay. If the LiveKit agent registered with an explicit dispatch name, set `livekit_agent_name` in `platform`.\n\nLiveKit does not support multiple concurrent Vent calls against one agent process yet. Run LiveKit calls sequentially unless you intentionally start separate agent worker processes and route each call to its own process. For Node agents, that means separate Node.js processes. Do not treat parallel calls against a single LiveKit worker as a valid concurrency test until multi-call support is engineered.\n\nUse the LiveKit helper for observability; do not publish `vent:*` topics manually. Do not hand-roll `vent:session-report` from `ctx.addShutdownCallback`; after `room.disconnect()` it can fail with `engine is closed`. The helper captures SDK metrics, tool events, conversation items, usage, and close events. Native LiveKit `lk.transcription` and `lk.agent.state` provide transcript and agent-state timing.\n\nBefore LiveKit tests, verify the agent process is running. After restart, wait at least 60 seconds before running a call. If a LiveKit run fails with `agent did not speak` or `no speech detected`, kill the stale agent, wait 60 seconds, then restart once.\n\n## Vent Output\n\n`npx vent-hq run` returns one JSON result on stdout in non-TTY agent mode; it is not an SSE JSONL stream. Analyze that result directly. Exit codes: `0` = call ran end-to-end through the pipeline; `1` = pipeline-level failure (call did not complete cleanly); `2` = harness error. None of these is a judgment of whether the agent actually accomplished the scenario\'s goal \u2014 decide that yourself from the transcript, tool calls, and the scenario\'s expected behavior.\n\nAlways-present keys (value may be `null` for `name`/`error`, otherwise non-null): `name`, `status`, `caller_prompt`, `duration_ms`, `error`, `transcript`, `tool_calls`, `warnings`, `audio_actions`. Always-present keys with nullable value (when the underlying analysis did not run): `latency`, `component_latency`, `call_metadata`, `emotion` (requires `prosody: true`). Conditional key (absent unless requested): `debug` (only when `--verbose`). Branch on null before reading nested fields.\n\nTrigger `--verbose` when:\n- transcript accuracy looks wrong and you need `platform_transcript` to compare against Vent\'s STT\n- latency looks bad and you need per-turn or component-level breakdowns\n- interruption / barge-in behavior looks wrong and you need per-turn debug fields\n- tool-call execution looks inconsistent or missing and you need the raw observed timeline\n- the provider returned warnings or errors and you need provider-native artifacts in `debug.provider_metadata`\n\nSkip `--verbose` for normal pass/fail iteration \u2014 it adds noise to stdout for no benefit.\n\nIgnore minor STT mis-transcriptions in `transcript` text (e.g. `"check teach hat"` for `"check that"`, swapped homophones, missing question marks on short tails). These are streaming-STT artifacts, not agent bugs. Judge on semantic intent, not exact spelling. Only flag transcript quality when it prevents understanding what the agent actually said.\n\n`audio_actions` lists which turns had injected interrupts; check the agent\'s reply at the next turn to judge whether it acknowledged the barge-in or restarted from scratch. Overtalk needs the recording and is not evaluable from transcript text alone.\n\nFor transfers, `call_metadata.transfer_attempted` (provider claimed it tried) and `call_metadata.transfer_completed` (mechanically verified by Vent) can disagree \u2014 report both. Use `call_metadata.transfers[]` for destination, type, and per-attempt status.\n\n## Reporting Results\n\nBefore reporting, read the agent\'s code to locate where the observed behavior originates. If the issue is small and you can fix it, fix it and explain what you did \u2014 don\'t ask permission first.\n\nThen write whatever shape of report fits the call. No mandated structure. The report is for a voice-agent developer who wants to know: did my change work, and if not, what do I do next? Adapt the depth to the call \u2014 a clean pass with nothing to report needs little; a regression with a multi-layer cause needs more. Use a transcript excerpt when it helps the user see what actually happened.\n\nHard rules \u2014 these are about not leaking Vent\'s internals into a user-facing report:\n\n- Translate raw numbers into plain English. Users do not know what "p95 850ms" means; say "snappy throughout" or "noticeably sluggish, around 1.6 seconds with the LLM as the bottleneck."\n- Always include the recording from `call_metadata.recording_url` as an inline `[Recording](url)` link, placed in **one block at the very end of the report** \u2014 never sprinkled through the prose. For a single call, it\'s one link as the last line. For multi-call reports, the bottom block lists one link per call labeled by name (e.g. `reschedule-appointment: [Recording](url)`). Never paste a bare URL.\n- Mission success is your judgment, not Vent\'s. The `status` field and exit code only indicate whether the call ran end-to-end through the pipeline. Don\'t parrot a Vent "pass" when the transcript shows the agent failed the task, and don\'t flag a Vent "fail" as a real bug when it was a pipeline blip.\n- Don\'t surface `warnings[]` (infrastructure noise). Don\'t surface Vent-side artifacts (caller wait modes, harness timing, internal pipeline quirks). Don\'t include `cost_usd` unless the user asks.\n- Surface only what the user can act on in their own agent\'s code or config.\n\nFor multi-call runs, lead with a short summary stating *your* judgment of what happened across the calls (e.g. "3 of 4 did what they were supposed to; `cancel-appointment` never actually canceled"), not a parroted Vent pass/fail count. Then cover each call with whatever depth it needs.\n';
5905
5905
 
5906
5906
  // src/skills/cursor.md
5907
- var cursor_default = '---\ndescription: Vent \u2014 Voice agent calls. Run calls against your voice agent, get pass/fail results. Use when the user asks to run voice-agent calls or when you need to verify voice agent behavior after code changes.\nalwaysApply: true\n---\n\n# Vent \u2014 Voice Agent Calls\n\nCall voice agents from the terminal. Calls run in the cloud (30-120s).\n\n## When to Call (read this first)\n\n- After modifying voice agent code (system prompt, tools, handlers): ALWAYS run calls\n- After changing audio config: run calls\n- Before marking a task complete that touches agent behavior: run calls\n- Do NOT skip call verification \u2014 voice agents are non-deterministic, small changes can break flows\n\n## LiveKit Agent Setup (REQUIRED before running LiveKit tests)\n\nWhen the user asks to test, configure, or run anything with a LiveKit agent, you MUST install the Vent LiveKit helper in the agent project and add the instrumentation call to the agent entrypoint BEFORE running any Vent calls.\n- Node.js: `npm install @vent-hq/livekit`, then add `instrumentLiveKitAgent({ ctx, session })` to the agent entrypoint.\n- Python: `pip install vent-livekit`, then add `instrument_livekit_agent(ctx=ctx, session=session)` to the agent entrypoint.\nWithout this, Vent cannot capture metrics, tool calls, usage, or session reports from the LiveKit agent.\n\nBefore running a LiveKit test, verify the agent process is running (`ps aux | grep <agent_process>`). If it\'s not running, start it before attempting any calls.\n\nLiveKit worker restart caveat: After restarting a LiveKit agent process, wait at least 60 seconds before running a Vent call. If calls were attempted against stale workers (failed with "agent did not speak" or "no speech detected"), kill the agent, wait 60 seconds for ALL stale workers to deregister, then restart. Do not restart the agent multiple times in quick succession \u2014 each restart creates another stale worker registration that compounds the problem.\n\n## Setup (if no Vent access token)\n\nIf `~/.vent/credentials` does not exist and `VENT_ACCESS_TOKEN` is not set:\n\n```bash\nnpx vent-hq init\n```\n\nOne command. No email, no browser, no human interaction needed. You get 10 free runs instantly.\n\n## Commands\n\n| Command | Purpose |\n|---------|---------|\n| `npx vent-hq init` | First-time setup (creates account + installs skills) |\n| `npx vent-hq agent start -f .vent/suite.<adapter>.json` | Start one shared local agent session (required for `start_command`) |\n| `npx vent-hq agent stop <session-id>` | Close a shared local agent session |\n| `npx vent-hq run -f .vent/suite.<adapter>.json` | Run a call from suite file (auto-selects if only one call) |\n| `npx vent-hq run -f .vent/suite.<adapter>.json --verbose` | Include debug fields in the result JSON |\n| `npx vent-hq run -f .vent/suite.<adapter>.json --call <name>` | Run a specific named call |\n| `npx vent-hq stop <run-id>` | Cancel a queued or running call |\n| `npx vent-hq status <run-id>` | Check results of a previous run |\n| `npx vent-hq status <run-id> --verbose` | Re-print a run with debug fields included |\n\n## When To Use `--verbose`\n\nDefault output is enough for most work. It already includes:\n- transcript\n- latency\n- audio analysis\n- tool calls\n- summary cost / recording / transfers\n\nUse `--verbose` only when you need debugging detail that is not in the default result:\n- per-turn debug fields: timestamps, caller decision mode, silence pad, STT confidence, platform transcript\n- raw signal analysis: `debug.signal_quality`\n- harness timings: `debug.harness_overhead`\n- raw prosody payload and warnings\n- raw provider warnings\n- per-turn component latency arrays\n- raw observed tool-call timeline\n- provider-specific metadata in `debug.provider_metadata`\n\nTrigger `--verbose` when:\n- transcript accuracy looks wrong and you need to inspect `platform_transcript`\n- latency is bad and you need per-turn/component breakdowns\n- interruptions/barge-in behavior looks wrong\n- tool-call execution looks inconsistent or missing\n- the provider returned warnings/errors or you need provider-native artifacts\n\nSkip `--verbose` when:\n- you only need pass/fail, transcript, latency, tool calls, recording, or summary\n- you are doing quick iteration on prompt wording and the normal result already explains the failure\n\n## Normalization Contract\n\nVent always returns one normalized result shape on `stdout` across adapters. Treat these as the stable categories:\n- `transcript`\n- `latency`\n- `tool_calls`\n- `component_latency`\n- `call_metadata`\n- `warnings`\n- `audio_actions`\n- `emotion`\n\nSource-of-truth policy:\n- Vent computes transcript, latency, and audio-quality metrics itself.\n- Hosted adapters choose the best source per category, usually provider post-call data for tool calls, call metadata, transfers, provider transcripts, and recordings.\n- Realtime provider events are fallback or enrichment only when post-call data is missing, delayed, weaker for that category, or provider-specific.\n- `LiveKit` helper events are the provider-native path for rich in-agent observability.\n- `websocket`/custom agents are realtime-native but still map into the same normalized categories.\n- Keep adapter-specific details in `call_metadata.provider_metadata` or `debug.provider_metadata`, not in new top-level fields.\n\n\n## Critical Rules\n\n1. **Run all calls in parallel in ONE shell command** \u2014 Cursor cannot run multiple shell tool calls concurrently. Instead, launch all calls in a **single** shell command using `&` and `wait`. Example: `npx vent-hq run -f .vent/suite.bland.json --call call-1 & npx vent-hq run -f .vent/suite.bland.json --call call-2 & wait`. Set a 300-second (5 min) timeout. NEVER run calls as separate commands \u2014 they will serialize.\n2. **Handle backgrounded commands** \u2014 If a call command gets moved to background by the system, wait for it to complete before proceeding. Never end your response without delivering call results.\n3. **Output format** \u2014 In non-TTY mode (when run by an agent), every SSE event is written to stdout as a JSON line. Results are always in stdout.\n4. **This skill is self-contained** \u2014 The full config schema is below. Do NOT re-read this file.\n5. **Always analyze results** \u2014 The run command outputs complete JSON with full transcript, latency, and tool calls. Use `--verbose` only when the default result is not enough to explain the failure. Analyze this output directly \u2014 do NOT run `vent status` afterwards unless you are re-checking a past run.\n\n## Workflow\n\n### First time: create the call suite\n\n1. Read the voice agent\'s codebase \u2014 understand its system prompt, tools, intents, and domain.\n2. Read the **Full Config Schema** section below for all available fields.\n3. Create the suite file in `.vent/` using the naming convention: `.vent/suite.<adapter>.json` (e.g., `.vent/suite.vapi.json`, `.vent/suite.websocket.json`, `.vent/suite.retell.json`). This prevents confusion when multiple adapters are tested in the same project.\n - Name calls after specific flows (e.g., `"reschedule-appointment"`, not `"call-1"`)\n - Write `caller_prompt` as a realistic persona with a specific goal, based on the agent\'s domain\n - Set `max_turns` based on the flow complexity (simple FAQ: 4-6, booking: 8-12, complex: 12-20)\n\n### Multiple suite files\n\nIf `.vent/` contains more than one suite file, **always check which adapter each suite uses before running**. Read the `connection.adapter` field in each file. Never run a suite intended for a different adapter \u2014 results will be meaningless or fail. When reporting results, always state which suite file produced them (e.g., "Results from `.vent/suite.vapi.json`:").\n\n### Subsequent runs \u2014 reuse the existing suite\n\nA matching `.vent/suite.<adapter>.json` already exists? Just re-run it. No need to recreate.\n\n### Run calls\n\n1. If the suite uses `start_command`, start the shared local session first:\n ```\n npx vent-hq agent start -f .vent/suite.<adapter>.json\n ```\n\n2. Run calls:\n ```\n # suite with one call (auto-selects)\n npx vent-hq run -f .vent/suite.<adapter>.json\n\n # suite with multiple calls \u2014 pick one by name\n npx vent-hq run -f .vent/suite.<adapter>.json --call happy-path\n\n # local start_command \u2014 add --session\n npx vent-hq run -f .vent/suite.<adapter>.json --call happy-path --session <session-id>\n ```\n\n3. To run multiple calls from the same suite, **run them in parallel in one shell command**:\n ```\n npx vent-hq run -f .vent/suite.vapi.json --call happy-path & npx vent-hq run -f .vent/suite.vapi.json --call edge-case & wait\n ```\n\n4. Analyze each result, identify failures, correlate with the codebase, and fix.\n5. **Compare with previous run** \u2014 Vent saves full result JSON to `.vent/runs/` after every run. Read the second-most-recent JSON in `.vent/runs/` and compare against the current run. Shape: `{ run_id, timestamp, git_sha, summary, call_results: [...] }`. Each entry in `call_results` is a flat normalized per-call result: `{ name, status, duration_ms, transcript, observed_tool_calls, metrics, cost_usd, ... }`. Compare: `call_results[i].status` flips, `call_results[i].metrics.latency_p50_ms` / `latency_p95_ms` changes >20%, `call_results[i].observed_tool_calls[].successful` count drops, `summary.total_cost_usd` increases >30%, `call_results[i].transcript` divergence. Correlate with `git diff` between the two runs\' `git_sha` values. Skip if no previous run exists.\n\n## Connection\n\n- **BYO agent runtime**: your agent owns its own provider credentials. Use `start_command` for a local agent or `agent_url` for a hosted custom endpoint.\n- **Platform-direct runtime**: use adapter `vapi | retell | elevenlabs | bland | livekit`. This is the only mode where Vent itself needs provider credentials and saved platform connections apply.\n\n## WebSocket Protocol (BYO agents)\n\nWhen using `adapter: "websocket"`, Vent communicates with the agent over a single WebSocket connection:\n\n- **Binary frames** \u2192 PCM audio (16-bit mono, configurable sample rate)\n- **Text frames** \u2192 optional JSON events the agent can send for better test accuracy:\n\n| Event | Format | Purpose |\n|-------|--------|---------|\n| `speech-update` | `{"type":"speech-update","status":"started"\\|"stopped"}` | Enables platform-assisted turn detection (more accurate than VAD alone) |\n| `tool_call` | `{"type":"tool_call","name":"...","arguments":{...},"result":...,"successful":bool,"duration_ms":number}` | Reports tool calls for observability |\n| `vent:timing` | `{"type":"vent:timing","stt_ms":number,"llm_ms":number,"tts_ms":number}` | Reports component latency breakdown per turn |\n| `vent:session` | `{"type":"vent:session","platform":"custom","provider_call_id":"...","provider_session_id":"..."}` | Reports stable provider/session identifiers |\n| `vent:call-metadata` | `{"type":"vent:call-metadata","call_metadata":{...}}` | Reports post-call metadata such as cost, recordings, variables, and provider-specific artifacts |\n| `vent:transcript` | `{"type":"vent:transcript","role":"caller"\\|"agent","text":"...","turn_index":0}` | Reports platform/native transcript text for caller or agent |\n| `vent:transfer` | `{"type":"vent:transfer","destination":"...","status":"attempted"\\|"completed"}` | Reports transfer attempts and outcomes |\n| `vent:debug-url` | `{"type":"vent:debug-url","label":"log","url":"https://..."}` | Reports provider debug/deep-link URLs |\n| `vent:warning` | `{"type":"vent:warning","message":"...","code":"..."}` | Reports provider/runtime warnings worth preserving in run metadata |\n\nVent sends `{"type":"end-call"}` to the agent when the test is done.\n\nAll text frames are optional \u2014 audio-only agents work fine with VAD-based turn detection.\n\n## Full Config Schema\n\n- ALL calls MUST reference the agent\'s real context (system prompt, tools, knowledge base) from the codebase.\n\n<vent_run>\n{\n "connection": { ... },\n "calls": {\n "happy-path": { ... },\n "edge-case": { ... }\n }\n}\n</vent_run>\n\nOne suite file per platform/adapter. `connection` is declared once, `calls` is a named map of call specs. Each key becomes the call name. Run one call at a time with `--call <name>`.\n\n<config_connection>\n{\n "connection": {\n "adapter": "required -- websocket | livekit | vapi | retell | elevenlabs | bland",\n "start_command": "shell command to start agent (relay only, required for local)",\n "health_endpoint": "health check path after start_command (default: /health, relay only, required for local)",\n "agent_url": "hosted custom agent URL (wss:// or https://). Use for BYO hosted agents.",\n "agent_port": "local agent port (default: 3001, required for local)",\n "platform": "optional authoring convenience for platform-direct adapters only. The CLI resolves this locally, creates/updates a saved platform connection, and strips raw provider secrets before submit. Do not use for websocket start_command or agent_url runs."\n }\n}\n\n<credential_resolution>\nIMPORTANT: How to handle platform credentials (API keys, secrets, agent IDs):\n\nThere are two product modes:\n- `BYO agent runtime`: your agent owns its own provider credentials. This covers both `start_command` (local) and `agent_url` (hosted custom endpoint).\n- `Platform-direct runtime`: Vent talks to `vapi`, `retell`, `elevenlabs`, `bland`, or `livekit` directly. This is the only mode that uses saved platform connections.\n\n1. For `start_command` and `agent_url` runs, do NOT put Deepgram / ElevenLabs / OpenAI / other provider keys into Vent config unless the Vent adapter itself needs them. Those credentials belong to the user\'s local or hosted agent runtime.\n2. For platform-direct adapters (`vapi`, `retell`, `elevenlabs`, `bland`, `livekit`), the CLI auto-resolves credentials from `.env.local`, `.env`, and the current shell env. If those env vars already exist, you can omit credential fields from the config JSON entirely.\n3. If you include credential fields in the config, put the ACTUAL VALUE, NOT the env var name. WRONG: `"vapi_api_key": "VAPI_API_KEY"`. RIGHT: `"vapi_api_key": "sk-abc123..."` or omit the field.\n4. The CLI uses the resolved provider config to create or update a saved platform connection server-side, then submits only `platform_connection_id`. Users should not manually author `platform_connection_id`.\n5. To check whether credentials are already available, inspect `.env.local`, `.env`, and any relevant shell env visible to the CLI process.\n6. **IMPORTANT: `npx vent-hq` commands auto-load `.env` files \u2014 never use `source .env && export` before running them.** Only your own custom scripts (e.g. `npx tsx my-script.ts`) need manual env loading. To add a new credential, just append it to `.env` and the CLI picks it up automatically on the next run.\n\nAuto-resolved env vars per platform:\n| Platform | Config field | Env var (auto-resolved from `.env.local`, `.env`, or shell env) |\n|----------|-------------|-----------------------------------|\n| Vapi | vapi_api_key | VAPI_API_KEY |\n| Vapi | vapi_assistant_id | VAPI_ASSISTANT_ID or VAPI_AGENT_ID |\n| Bland | bland_api_key | BLAND_API_KEY |\n| Bland | bland_pathway_id | BLAND_PATHWAY_ID |\n| Bland | persona_id | BLAND_PERSONA_ID |\n| LiveKit | livekit_api_key | LIVEKIT_API_KEY |\n| LiveKit | livekit_api_secret | LIVEKIT_API_SECRET |\n| LiveKit | livekit_url | LIVEKIT_URL |\n| Retell | retell_api_key | RETELL_API_KEY |\n| Retell | retell_agent_id | RETELL_AGENT_ID |\n| ElevenLabs | elevenlabs_api_key | ELEVENLABS_API_KEY |\n| ElevenLabs | elevenlabs_agent_id | ELEVENLABS_AGENT_ID |\n\nThe CLI strips raw platform secrets before `/runs/submit`. Platform-direct runs go through a saved `platform_connection_id` automatically. BYO agent runs (`start_command` and `agent_url`) do not.\n</credential_resolution>\n\n<config_adapter_rules>\nWebSocket (local agent via relay):\n{\n "connection": {\n "adapter": "websocket",\n "start_command": "npm run start",\n "health_endpoint": "/health",\n "agent_port": 3001\n }\n}\n\nWebSocket (hosted custom agent):\n{\n "connection": {\n "adapter": "websocket",\n "agent_url": "https://my-agent.fly.dev"\n }\n}\n\nRetell:\n{\n "connection": {\n "adapter": "retell",\n "platform": { "provider": "retell" }\n }\n}\nCredentials auto-resolve from `.env.local`, `.env`, or shell env: RETELL_API_KEY, RETELL_AGENT_ID. Only add retell_api_key/retell_agent_id to the JSON if those env vars are not already available.\nmax_concurrency for Retell: Pay-as-you-go includes 20 concurrent calls, with more available on demand; Enterprise has no cap. Ask the user which plan they\'re on. If unknown, default to 20.\n\nBland:\n{\n "connection": {\n "adapter": "bland",\n "platform": { "provider": "bland" }\n }\n}\nCredentials auto-resolve from `.env.local`, `.env`, or shell env: BLAND_API_KEY, BLAND_PATHWAY_ID, BLAND_PERSONA_ID. Only add bland_api_key/bland_pathway_id/persona_id to the JSON if those env vars are not already available.\nmax_concurrency for Bland: Start=10, Build=50, Scale=100, Enterprise=unlimited. Ask the user which plan they\'re on. If unknown, default to 10.\nNote: All agent config (voice, model, tools, etc.) is set on the pathway itself, not in Vent config.\n\nVapi:\n{\n "connection": {\n "adapter": "vapi",\n "platform": { "provider": "vapi" }\n }\n}\nCredentials auto-resolve from `.env.local`, `.env`, or shell env: VAPI_API_KEY, VAPI_ASSISTANT_ID (or VAPI_AGENT_ID). Only add vapi_api_key/vapi_assistant_id to the JSON if those env vars are not already available.\nmax_concurrency for Vapi: every account includes 10 concurrent call slots by default; self-serve accounts can buy extra reserved lines, and Enterprise includes unlimited concurrency. Set this to the user\'s purchased limit. If unknown, default to 10.\nAll assistant config (voice, model, transcriber, interruption settings, etc.) is set on the Vapi assistant itself, not in Vent config.\n\nElevenLabs:\n{\n "connection": {\n "adapter": "elevenlabs",\n "platform": { "provider": "elevenlabs" }\n }\n}\nCredentials auto-resolve from `.env.local`, `.env`, or shell env: ELEVENLABS_API_KEY, ELEVENLABS_AGENT_ID. Only add elevenlabs_api_key/elevenlabs_agent_id to the JSON if those env vars are not already available.\nmax_concurrency for ElevenLabs: Free=4, Starter=6, Creator=10, Pro=20, Scale=30, Business=30. Burst pricing can temporarily allow up to 3x the base limit. Ask the user which plan they\'re on and whether burst is enabled. If unknown, default to 4.\n\nLiveKit:\n{\n "connection": {\n "adapter": "livekit",\n "platform": {\n "provider": "livekit",\n "livekit_agent_name": "my-agent",\n "max_concurrency": 5\n }\n }\n}\nCredentials auto-resolve from `.env.local`, `.env`, or shell env: LIVEKIT_API_KEY, LIVEKIT_API_SECRET, LIVEKIT_URL. Only add these to the JSON if those env vars are not already available.\nlivekit_agent_name is optional -- only needed if your LiveKit agent registered with an explicit dispatch name in the SDK, e.g. Python `@server.rtc_session(agent_name="\u2026")` or `WorkerOptions(agent_name="\u2026")`, Node.js `new ServerOptions({ agentName: "\u2026" })`. Omit for automatic dispatch.\nThe livekit adapter requires the LiveKit Agents SDK. It depends on Agents SDK signals (lk.agent.state, lk.transcription) for readiness detection, turn timing, and component latency. Custom LiveKit participants not using the Agents SDK should use the websocket adapter with a relay instead.\nmax_concurrency for LiveKit Cloud: Build=5, Ship=20, Scale=50 managed inference sessions. Agent session concurrency can be higher (Build=5, Ship=20, Scale up to 600), but managed inference is the usual gating limit for voice agents. Ask the user which tier they\'re on. If unknown, default to 5.\nKnow the provider/account concurrency limits and use them in planning, but Vent does not enforce provider caps at runtime. Hosted worker throughput is an infra setting: `WORKER_TOTAL_CONCURRENCY` caps one worker Machine.\n</config_adapter_rules>\n</config_connection>\n\n\n<call_config>\n<tool_call_capture>\nvapi/retell/elevenlabs/bland: automatic via platform API (no user code needed).\nWebSocket: user\'s agent must emit a JSON text frame per tool call: {"type":"tool_call","name":"...","arguments":{},"result":{},"successful":true,"duration_ms":150}\nLiveKit: use the `@vent-hq/livekit` (Node) or `vent-livekit` (Python) helper. See the "LiveKit Agent Setup" section. The helper captures tool calls automatically from Agents SDK session events \u2014 do not publish on Vent topics manually.\n</tool_call_capture>\n\n<component_timing>\nPlatform adapters (vapi/retell/elevenlabs/bland/livekit) get STT/LLM/TTS breakdown automatically.\nWebSocket agents can opt in by sending a JSON text frame after each agent turn:\n {"type":"vent:timing","stt_ms":120,"llm_ms":450,"tts_ms":80}\nAll fields optional. Send one per agent response. Without this, component_latency is omitted from results.\nWhen modifying a WebSocket agent\'s code, add this text frame after TTS completes to enable component latency reporting.\n</component_timing>\n\n<metadata_capture>\nWebSocket agents can emit richer observability metadata as JSON text frames:\n {"type":"vent:session","platform":"custom","provider_call_id":"call_123","provider_session_id":"session_abc"}\n {"type":"vent:call-metadata","call_metadata":{"recording_url":"https://...","cost_usd":0.12,"provider_debug_urls":{"log":"https://..."}}}\n {"type":"vent:debug-url","label":"trace","url":"https://..."}\n {"type":"vent:session-report","report":{"room_name":"room-123","events":[...],"metrics":[...]}}\n {"type":"vent:transcript","role":"caller","text":"I need to reschedule","turn_index":0}\n\n`vent:session-report` in the docs is not a blanket instruction for LiveKit agents. In LiveKit mode, only publish what the helper explicitly supports \u2014 hand-rolling a report from `ctx.addShutdownCallback` runs after `room.disconnect()` and fails with "engine is closed".\n\nLiveKit agents get all metadata through the `@vent-hq/livekit` (Node) / `vent-livekit` (Python) helper \u2014 it subscribes to Agents SDK session events (`metrics_collected`, `function_tools_executed`, `conversation_item_added`, `session_usage_updated`, close) and publishes on Vent topics automatically. Transcript and agent-state timing come from native LiveKit room signals (`lk.transcription`, `lk.agent.state`) \u2014 the helper does not duplicate them.\n\nNode.js \u2014 `npm install @vent-hq/livekit`:\n```ts\nimport { instrumentLiveKitAgent } from "@vent-hq/livekit";\n\nconst vent = instrumentLiveKitAgent({ ctx, session });\n```\nPython \u2014 `pip install vent-livekit`:\n```python\nfrom vent_livekit import instrument_livekit_agent\n\nvent = instrument_livekit_agent(ctx=ctx, session=session)\n```\n\nThe helper is the only supported integration path for LiveKit Agents SDK agents. Do not publish on `vent:*` topics manually \u2014 let the helper forward SDK events.\n</metadata_capture>\n\n<config_call>\nEach call in the `calls` map. The key is the call name (e.g. `"reschedule-appointment"`, not `"call-1"`).\n{\n "caller_prompt": "required \u2014 caller persona and behavior (name -> goal -> emotion -> conditional behavior)",\n "max_turns": "required \u2014 default 6",\n "silence_threshold_ms": "optional \u2014 end-of-turn threshold ms (default 800, 200-10000). 800-1200 FAQ, 2000-3000 tool calls, 3000-5000 complex reasoning.",\n "persona": "optional \u2014 caller behavior controls",\n {\n "pace": "slow | normal | fast",\n "clarity": "clear | vague | rambling",\n "disfluencies": "true | false",\n "cooperation": "cooperative | reluctant | hostile",\n "emotion": "neutral | cheerful | confused | frustrated | skeptical | rushed",\n "interruption_style": "optional preplanned interrupt tendency: low | high. If set, Vent may pre-plan a caller cut-in before the agent turn starts. It does NOT make a mid-turn interrupt LLM call.",\n "memory": "reliable | unreliable",\n "intent_clarity": "clear | indirect | vague",\n "confirmation_style": "explicit | vague"\n },\n "audio_actions": "optional \u2014 per-turn audio stress calls",\n [\n { "action": "interrupt", "at_turn": "N", "prompt": "what caller says" },\n { "action": "inject_noise", "at_turn": "N", "noise_type": "babble | white | pink", "snr_db": "0-40" },\n { "action": "split_sentence", "at_turn": "N", "split": { "part_a": "...", "part_b": "...", "pause_ms": "500-5000" } },\n { "action": "noise_on_caller", "at_turn": "N" }\n ],\n "prosody": "optional \u2014 Hume emotion analysis (default false)",\n "caller_audio": "optional \u2014 omit for clean audio",\n {\n "noise": { "type": "babble | white | pink", "snr_db": "0-40" },\n "speed": "0.5-2.0 (1.0 = normal)",\n "speakerphone": "true | false",\n "mic_distance": "close | normal | far",\n "clarity": "0.0-1.0 (1.0 = perfect)",\n "accent": "american | british | australian | filipino | spanish_mexican | spanish_peninsular | spanish_colombian | spanish_argentine | german | french | italian | dutch | japanese",\n "packet_loss": "0.0-0.3",\n "jitter_ms": "0-100"\n },\n "language": "optional \u2014 ISO 639-1: en, es, fr, de, it, nl, ja"\n}\n\nInterruption rules:\n- `audio_actions: [{ "action": "interrupt", ... }]` is the deterministic per-turn interrupt test. Prefer this for evaluation.\n- `persona.interruption_style` is only a preplanned caller tendency. If used, Vent decides before the agent response starts whether this turn may cut in.\n- Vent no longer pauses mid-turn to ask a second LLM whether to interrupt.\n- For production-faithful testing, prefer explicit `audio_actions.interrupt` over persona interruption.\n\n<examples_call>\n<simple_suite_example>\n{\n "connection": {\n "adapter": "vapi",\n "platform": { "provider": "vapi" }\n },\n "calls": {\n "reschedule-appointment": {\n "caller_prompt": "You are Maria, calling to reschedule her dentist appointment from Thursday to next Tuesday. She\'s in a hurry and wants this done quickly.",\n "max_turns": 8\n },\n "cancel-appointment": {\n "caller_prompt": "You are Tom, calling to cancel his appointment for Friday. He\'s calm and just wants confirmation.",\n "max_turns": 6\n }\n }\n}\n</simple_suite_example>\n\n<advanced_call_example>\nA call entry with advanced options (persona, audio actions, prosody):\n{\n "noisy-interruption-booking": {\n "caller_prompt": "You are James, an impatient customer calling from a loud coffee shop to book a plumber for tomorrow morning. You interrupt the agent mid-sentence when they start listing availability \u2014 you just want the earliest slot.",\n "max_turns": 12,\n "persona": { "pace": "fast", "cooperation": "reluctant", "emotion": "rushed", "interruption_style": "high" },\n "audio_actions": [\n { "action": "interrupt", "at_turn": 3, "prompt": "Just give me the earliest one!" },\n { "action": "inject_noise", "at_turn": 1, "noise_type": "babble", "snr_db": 15 }\n ],\n "caller_audio": { "noise": { "type": "babble", "snr_db": 20 }, "speed": 1.3 },\n "prosody": true\n }\n}\n</advanced_call_example>\n\n</examples_call>\n</config_call>\n\n<output_conversation_test>\n{\n "name": "sarah-hotel-booking",\n "status": "completed",\n "caller_prompt": "You are Sarah, calling to book...",\n "duration_ms": 45200,\n "error": null,\n "transcript": [\n { "role": "caller", "text": "Hi, I\'d like to book..." },\n { "role": "agent", "text": "Sure! What date?", "ttfb_ms": 650, "ttfw_ms": 780, "audio_duration_ms": 2400 },\n { "role": "agent", "text": "Let me check availability.", "ttfb_ms": 540, "ttfw_ms": 620, "audio_duration_ms": 1400 },\n { "role": "caller", "text": "Just the earliest slot please", "audio_duration_ms": 900 },\n { "role": "agent", "text": "Sure, the earliest is 9 AM tomorrow.", "ttfb_ms": 220, "ttfw_ms": 260, "audio_duration_ms": 2100 }\n ],\n "latency": {\n "response_time_ms": 890, "response_time_source": "ttfw",\n "p50_response_time_ms": 850, "p90_response_time_ms": 1100, "p95_response_time_ms": 1400, "p99_response_time_ms": 1550,\n "first_response_time_ms": 1950,\n "mean_ttfw_ms": 890, "p50_ttfw_ms": 850, "p95_ttfw_ms": 1400, "p99_ttfw_ms": 1550,\n "first_turn_ttfw_ms": 1950,\n "drift_slope_ms_per_turn": -45.2, "mean_silence_pad_ms": 128, "mouth_to_ear_est_ms": 1020\n },\n "tool_calls": {\n "total": 2, "successful": 2, "failed": 0, "mean_latency_ms": 340,\n "names": ["check_availability", "book_appointment"],\n "observed": [{ "name": "check_availability", "arguments": { "date": "2026-03-12" }, "result": { "slots": ["09:00", "10:00"] }, "successful": true, "latency_ms": 280, "turn_index": 3 }]\n },\n "component_latency": {\n "mean_stt_ms": 120, "mean_llm_ms": 450, "mean_tts_ms": 80,\n "p95_stt_ms": 180, "p95_llm_ms": 620, "p95_tts_ms": 110,\n "mean_speech_duration_ms": 2100,\n "bottleneck": "llm"\n },\n "call_metadata": {\n "platform": "vapi",\n "cost_usd": 0.08,\n "recording_url": "https://example.com/recording",\n "ended_reason": "customer_ended_call",\n "transfers": []\n },\n "warnings": [],\n "audio_actions": [],\n "emotion": {\n "naturalness": 0.72, "mean_calmness": 0.65, "mean_confidence": 0.58, "peak_frustration": 0.08, "emotion_trajectory": "stable"\n }\n}\n\nAlways present: name, status, caller_prompt, duration_ms, error, transcript, tool_calls, warnings, audio_actions. Nullable when analysis didn\'t run: latency, component_latency, call_metadata, emotion (requires prosody: true), debug (requires --verbose).\n\n### Result presentation\n\nWhen you report a conversation result to the user, always include:\n\n1. **Summary** \u2014 the overall verdict and the 1-3 most important findings.\n2. **Transcript summary** \u2014 a short narrative of what happened in the call.\n3. **Recording URL** \u2014 include `call_metadata.recording_url` when present; explicitly say when it is unavailable.\n4. **Next steps** \u2014 concrete fixes, follow-up tests, or why no change is needed.\n\nUse metrics to support the summary, not as the whole answer. Do not dump raw numbers without interpretation.\n\nWhen `call_metadata.transfer_attempted` is present, explicitly say whether the transfer only appeared attempted or was mechanically verified as completed (`call_metadata.transfer_completed`). Use `call_metadata.transfers[]` to report transfer type, destination, status, and sources.\n\n### Judging guidance\n\nUse the transcript, metrics, test scenario, and relevant agent instructions/system prompt to judge:\n\n| Dimension | What to check |\n|--------|----------------|\n| **Hallucination detection** | Check whether the agent stated anything not grounded in its instructions, tools, or the conversation itself. |\n| **Instruction following** | Compare the agent\'s behavior against its system prompt and the test\'s expected constraints. |\n| **Context retention** | Check whether the agent forgot or contradicted information established earlier in the call. |\n| **Semantic accuracy** | Check whether the agent correctly understood the caller\'s intent and responded to the real request. |\n| **Goal completion** | Decide whether the agent achieved what the test scenario was designed to verify. |\n| **Transfer correctness** | For transfer scenarios, judge whether transfer was appropriate, whether it completed, whether it went to the expected destination, and whether enough context was passed during the handoff. |\n\nIgnore minor STT mis-transcriptions in `transcript` text (e.g. `"check teach hat"` for `"check that"`, swapped homophones, missing question marks on short tails). These are streaming-STT artifacts, not agent bugs. Judge on semantic intent, not exact spelling. Only flag transcript quality when it prevents understanding what the agent actually said.\n\n### Interruption evaluation\n\nEvaluate interruption handling by reading the transcript and listening to the recording. Flag any turn where the agent ignores a barge-in, repeats itself from scratch, or loses context after being cut off.\n\n| Dimension | How to evaluate |\n|--------|----------------|\n| **Recovery** | After a caller cuts in, does the agent\'s next reply acknowledge or address the barge-in rather than restarting from scratch? |\n| **Context retention** | After the interruption, does the agent remember pre-interrupt conversation state? |\n| **Overtalk** | Does the agent keep speaking for long after the caller starts, or does it yield promptly? Use the recording to judge. |\n</output_conversation_test>\n</call_config>\n\n\n## Exit Codes\n\n0=pass, 1=fail, 2=error\n\n## Vent Access Token\n\nSet `VENT_ACCESS_TOKEN` env var or run `npx vent-hq login`.\n';
5907
+ var cursor_default = '---\ndescription: Place real calls against the user\'s voice agent to verify recent code changes worked end-to-end. Trigger on test requests or after any non-trivial voice-agent change.\nalwaysApply: true\n---\n\n# Vent - Voice Agent Calls\n\nUse Vent to verify real voice-agent behavior from the terminal.\n\n## When to Test\n\nVent is the voice CI gate for the agent. A non-trivial change that touches voice behavior is not "done" until the relevant Vent call has been run and you\'ve judged from its output that the change works. Auto-trigger a call after finishing the change \u2014 do not wait to be asked.\n\nNon-trivial means anything that can change what the caller hears, how the agent responds, what tools are called, how the call routes, or how audio timing behaves: prompt rewrites or sections (not a one-word tweak), tool schema or handler changes, state/memory logic, transfer logic, interruption handling, provider/audio config, platform integration code, and any bug fix for voice-agent behavior. One-line copy or comment edits do not need a call.\n\nBefore launching, check `.vent/runs/` for unresolved issues from prior runs (failing status, regressions, warnings, latency jumps). If any are still present and within the scope of the current task, fix them as part of this change, then run the call to verify both the new fix and the prior issues are resolved. Report unresolved prior issues that are out of scope rather than silently leaving them.\n\nDo not declare the task complete until you\'ve reviewed the call results \u2014 transcript, tool calls, observed behavior \u2014 and confirmed the change actually does what was intended. Vent\'s `status` and exit code only tell you whether the call ran end-to-end without a pipeline error; mission success is your judgment.\n\n## Autonomous Iteration\n\nYou can run a fix \u2192 Vent-call \u2192 fix \u2192 Vent-call loop within a single user request, doing many iterations before returning. This is genuinely useful for mechanical issues where the fix is high-confidence but the verification is slow.\n\nDefault: don\'t iterate autonomously. Run the relevant call after your change, surface findings, and let the user decide what\'s next \u2014 they stay in control of cost and scope.\n\nTrigger autonomous iteration when either:\n- The user explicitly asks for it ("iterate until it works", "keep going", "autonomous mode", "fix everything").\n- You judge it\'s clearly worth it: the change is well-scoped, the failure mode is mechanical (tool schema, registry, prompt phrasing), and you have a concrete plan for the next attempt. If you\'d be guessing at the next attempt, stop and ask.\n\nCap iterations at ~5 unless the user gave a higher bound. If the same fix attempt fails twice, or the failure mode keeps shifting between attempts, stop and report \u2014 you\'re thrashing.\n\nThe user may not know autonomous iteration is on the table. When you suspect a likely-multi-cycle issue, offer it once before starting solo (e.g. "I can iterate on this autonomously, otherwise I\'ll stop here for your review").\n\n## Cursor Execution\n\nCursor cannot run separate shell tool calls concurrently, so run multiple calls from one suite by invoking each named call with `--call <name>` in one shell command using `&` and a final `wait`.\n\n```bash\nnpx vent-hq run -f .vent/suite.vapi.json --call happy-path & \\\nnpx vent-hq run -f .vent/suite.vapi.json --call tool-path & \\\nwait\n```\n\nUse a 5-minute shell-tool timeout (`300000` ms) on Vent run commands so normal calls are not killed by the default 2-minute Bash timeout. This is not backgrounding; wait for stdout/results before ending your response. Use the JSON returned by `npx vent-hq run` directly; do not call `vent status` unless checking an older run.\n\n## Workflow\n\n1. Identify the behavior under test. Read enough of the agent codebase to understand its system prompt, tools, handlers, routes, provider config, platform wiring, and expected handoffs.\n2. Reuse an existing `.vent/suite.<adapter>.json` when possible. If `.vent/` contains multiple suites, inspect `connection.adapter` and report which suite file produced the result.\n3. Create or update a suite only when the existing calls do not cover the changed behavior. Name calls after real flows, for example `reschedule-appointment`, not `call-1`.\n4. If the suite uses `start_command`, start one shared local session first with `npx vent-hq agent start -f .vent/suite.<adapter>.json`, then pass `--session <session-id>` to each run.\n5. Pick which call(s) to run based on the change. Fixed bug: replay the failing scenario. Changed tool: include a call that triggers that tool. Prompt or routing change: include the relevant happy path and any important edge path.\n6. Compare against the previous JSON in `.vent/runs/` when validating a fix or regression. Check status flips, latency jumps, tool-call success drops, cost jumps, and transcript divergence. Correlate with `git diff` between saved `git_sha` values when available; skip if no previous run exists.\n\n## Saved Runs\n\nAfter every run, Vent writes the full result JSON to `.vent/runs/`. Shape:\n\n```jsonc\n{\n "run_id": "...",\n "timestamp": "2026-04-21T...Z",\n "git_sha": "...",\n "summary": { "status": "completed", "calls_total": 2, "calls_passed": 2, "calls_failed": 0, "total_duration_ms": 12345, "total_cost_usd": 0.01 },\n "call_results": [\n { "name": "happy-path", "status": "pass", "duration_ms": 6123, "transcript": [], "observed_tool_calls": [], "metrics": { "latency_p50_ms": 420, "latency_p95_ms": 980 }, "cost_usd": 0.004 }\n ]\n}\n```\n\nWhen comparing against a prior run (Workflow step 6), inspect these paths:\n\n- Run-completion status flips: `call_results[i].status` (this only reflects whether the call ran cleanly through the pipeline, not whether the agent accomplished the goal \u2014 judge that from the transcript)\n- Latency: `call_results[i].metrics.latency_p50_ms` or `latency_p95_ms` increased >20%\n- Tool calls: count of `call_results[i].observed_tool_calls[].successful` dropped\n- Cost: `summary.total_cost_usd` or `call_results[i].cost_usd` increased >30%\n- Transcript: `call_results[i].transcript` diverged in semantic content (ignore STT noise)\n\n## Commands\n\n```bash\nnpx vent-hq init # First-time setup: autonomous auth, access token generation, skill install, and minimal starter suite\nnpx vent-hq login # Log in to an existing account and save credentials\nnpx vent-hq run -f .vent/suite.<adapter>.json # Run the only call in a suite, or error if the suite has multiple calls\nnpx vent-hq run -f .vent/suite.<adapter>.json --call happy-path # Run one named call from a multi-call suite\nnpx vent-hq run -f .vent/suite.<adapter>.json --call happy-path --session <session-id> # Run one named call through an existing local relay session\nnpx vent-hq run -f .vent/suite.<adapter>.json --call happy-path --verbose # Run one named call with verbose debug fields\nnpx vent-hq stop <run-id> # Cancel a queued or running run\nnpx vent-hq status <run-id> # Fetch results for a previous run\nnpx vent-hq status <run-id> --verbose # Fetch previous run results with verbose debug fields\nnpx vent-hq agent start -f .vent/suite.<adapter>.json # Start a shared local relay session for suites that use start_command\nnpx vent-hq agent stop <session-id> # Stop a shared local relay session\n```\n\nIf `~/.vent/credentials` is missing and `VENT_ACCESS_TOKEN` is not set, run `npx vent-hq init`. For an existing account, run `npx vent-hq login` or set `VENT_ACCESS_TOKEN`.\n\n## Suite Config\n\nSuites live in `.vent/suite.<adapter>.json`. `connection` is declared once per suite. `calls` is a named map, and each key becomes the call name used with `--call`.\n\nLocal websocket suite:\n\n```json\n{\n "connection": {\n "adapter": "websocket",\n "start_command": "npm run start",\n "health_endpoint": "/health",\n "agent_port": 3001\n },\n "calls": {\n "happy-path": {\n "caller_prompt": "You are Maria calling to reschedule her appointment to next Tuesday.",\n "max_turns": 8,\n "silence_threshold_ms": 1200,\n "audio_actions": [\n { "action": "interrupt", "at_turn": 3, "prompt": "Just give me the earliest one." }\n ]\n }\n }\n}\n```\n\nPlatform-direct suite:\n\n```json\n{\n "connection": {\n "adapter": "vapi",\n "platform": { "provider": "vapi" }\n },\n "calls": {\n "happy-path": {\n "caller_prompt": "You are Maria calling to reschedule her appointment to next Tuesday.",\n "max_turns": 8\n }\n }\n}\n```\n\nWrite `caller_prompt` as a realistic caller with a name, goal, mood, constraints, and conditional behavior. Set `max_turns` based on flow complexity: FAQ `4-6`, booking or tool use `8-12`, complex flows `12-20`.\n\nCall fields:\n\n- `caller_prompt` and `max_turns` are required.\n- `silence_threshold_ms` must be `200-10000`. Common ranges: FAQ `800-1200`, tool calls `2000-3000`, complex reasoning `3000-5000`.\n- `persona` supports `pace`, `clarity`, `disfluencies`, `cooperation`, `emotion`, `interruption_style`, `memory`, `intent_clarity`, and `confirmation_style`.\n- `audio_actions` supports `interrupt`, `inject_noise`, `split_sentence`, and `noise_on_caller`.\n- `caller_audio` supports noise, speed, speakerphone, mic distance, clarity, accent, packet loss, and jitter.\n- `language` is an ISO 639-1 code such as `en`, `es`, `fr`, `de`, `it`, `nl`, or `ja`.\n- `prosody: true` enables emotion analysis and requires Hume access.\n- Prefer explicit `audio_actions.interrupt` over `persona.interruption_style` for deterministic barge-in tests. `persona.interruption_style` is only a preplanned caller tendency.\n\n## Connections and Credentials\n\n### Adapter choice\n\nUse `websocket` for your own local or hosted runtime. Use `start_command` for local agents or `agent_url` for hosted custom endpoints. For `start_command` and `agent_url`, do not put Deepgram, ElevenLabs, OpenAI, or other agent runtime keys into Vent config unless the Vent adapter itself needs them \u2014 the tested agent owns its own runtime credentials.\n\nUse `vapi`, `retell`, `elevenlabs`, `bland`, or `livekit` for platform-direct testing. In this mode Vent itself talks to the provider on the user\'s behalf.\n\nVent provides `DEEPGRAM_API_KEY` and `ANTHROPIC_API_KEY` for its hosted caller/evaluation stack \u2014 those are Vent\'s, not the tested agent\'s.\n\n### Credential resolution\n\nIn platform-direct mode the CLI auto-resolves credentials from `.env.local`, `.env`, and the current shell environment. Do not run `source .env && export` before Vent commands. If you include credential fields in JSON, use the actual value, not the env var name. Do not manually author `platform_connection_id`; the CLI creates or updates the saved platform connection automatically.\n\nAuto-resolved env vars and JSON fields:\n\n- Vapi: `VAPI_API_KEY` -> `vapi_api_key`; `VAPI_ASSISTANT_ID` or `VAPI_AGENT_ID` -> `vapi_assistant_id`\n- Bland: `BLAND_API_KEY` -> `bland_api_key`; `BLAND_PATHWAY_ID` -> `bland_pathway_id`; `BLAND_PERSONA_ID` -> `persona_id`\n- LiveKit: `LIVEKIT_API_KEY` -> `livekit_api_key`; `LIVEKIT_API_SECRET` -> `livekit_api_secret`; `LIVEKIT_URL` -> `livekit_url`\n- Retell: `RETELL_API_KEY` -> `retell_api_key`; `RETELL_AGENT_ID` -> `retell_agent_id`\n- ElevenLabs: `ELEVENLABS_API_KEY` -> `elevenlabs_api_key`; `ELEVENLABS_AGENT_ID` -> `elevenlabs_agent_id`\n\n### Provider config\n\nUse existing provider config when possible: Vapi assistant, Retell agent, ElevenLabs agent, Bland pathway, or LiveKit agent. Bland uniquely supports inline config \u2014 `platform` may use `bland_pathway_id`, `persona_id`, or an inline `task` (with optional voice, model, and turn-handling overrides; see Bland\'s API docs for the full field list).\n\n### Concurrency\n\nWhen you fan out multiple Vent calls in parallel against the same provider (for example, running several named calls from one suite at once with `&` and `wait`), respect the provider\'s per-account concurrency limit. Exceeding it makes calls queue or fail at the provider \u2014 Vent does not enforce these caps for you.\n\nRecord the limit as `max_concurrency` in the suite\'s `platform` block so it\'s visible on future runs. Ask the user which plan they\'re on if sizing matters; otherwise use the conservative default in bold.\n\n- **Vapi**: **10** included per account; reserved lines can be purchased self-serve; Enterprise is unlimited.\n- **Retell**: Pay-as-you-go includes **20**; Enterprise has no cap.\n- **Bland**: Start=**10**, Build=50, Scale=100, Enterprise=unlimited.\n- **ElevenLabs**: Free=**4**, Starter=6, Creator=10, Pro=20, Scale=30, Business=30. Burst pricing can temporarily allow up to 3x base.\n- **LiveKit Cloud**: Build=**5**, Ship=20, Scale=50 managed inference sessions (the usual gate for voice agents); agent-session concurrency can go higher (Scale up to 600).\n\n## WebSocket\n\nFor `adapter: "websocket"`, Vent sends binary 16-bit mono PCM audio over one websocket connection. Websocket text frames are optional JSON events. Audio-only websocket agents still work, but events improve turn detection and observability. Vent sends `{"type":"end-call"}` when the test is done.\n\nUseful websocket text frames:\n\n```jsonc\n{"type":"speech-update","status":"started"}\n{"type":"speech-update","status":"stopped"}\n{"type":"tool_call","name":"check_availability","arguments":{},"result":{},"successful":true,"duration_ms":150}\n{"type":"vent:timing","stt_ms":120,"llm_ms":450,"tts_ms":80}\n{"type":"vent:session","platform":"custom","provider_call_id":"call_123","provider_session_id":"session_abc"}\n{"type":"vent:call-metadata","call_metadata":{"recording_url":"https://...","cost_usd":0.12}}\n{"type":"vent:transcript","role":"caller","text":"I need to reschedule","turn_index":0}\n{"type":"vent:transfer","destination":"+15551234567","status":"attempted"}\n{"type":"vent:debug-url","label":"trace","url":"https://..."}\n{"type":"vent:warning","message":"provider warning","code":"provider_warning"}\n```\n\n`vent:session-report` is **not** handled by the websocket adapter \u2014 it\'s only consumed by the LiveKit helper. Do not emit it from a websocket agent.\n\nPlatform adapters capture tool calls automatically. Websocket agents must emit `tool_call` frames for tool observability. Platform adapters get component latency automatically. Websocket agents should emit `vent:timing` after each agent response when STT/LLM/TTS breakdown is available.\n\n## LiveKit\n\nBefore running LiveKit tests, install and add the Vent helper to the LiveKit agent entrypoint. Node: `npm install @vent-hq/livekit`, then call `instrumentLiveKitAgent({ ctx, session })`. Python: `pip install vent-livekit`, then call `instrument_livekit_agent(ctx=ctx, session=session)`.\n\nLiveKit direct mode requires the LiveKit Agents SDK. Custom LiveKit participants should use the websocket adapter with a relay. If the LiveKit agent registered with an explicit dispatch name, set `livekit_agent_name` in `platform`.\n\nLiveKit does not support multiple concurrent Vent calls against one agent process yet. Run LiveKit calls sequentially unless you intentionally start separate agent worker processes and route each call to its own process. For Node agents, that means separate Node.js processes. Do not treat parallel calls against a single LiveKit worker as a valid concurrency test until multi-call support is engineered.\n\nUse the LiveKit helper for observability; do not publish `vent:*` topics manually. Do not hand-roll `vent:session-report` from `ctx.addShutdownCallback`; after `room.disconnect()` it can fail with `engine is closed`. The helper captures SDK metrics, tool events, conversation items, usage, and close events. Native LiveKit `lk.transcription` and `lk.agent.state` provide transcript and agent-state timing.\n\nBefore LiveKit tests, verify the agent process is running. After restart, wait at least 60 seconds before running a call. If a LiveKit run fails with `agent did not speak` or `no speech detected`, kill the stale agent, wait 60 seconds, then restart once.\n\n## Vent Output\n\n`npx vent-hq run` returns one JSON result on stdout in non-TTY agent mode; it is not an SSE JSONL stream. Analyze that result directly. Exit codes: `0` = call ran end-to-end through the pipeline; `1` = pipeline-level failure (call did not complete cleanly); `2` = harness error. None of these is a judgment of whether the agent actually accomplished the scenario\'s goal \u2014 decide that yourself from the transcript, tool calls, and the scenario\'s expected behavior.\n\nAlways-present keys (value may be `null` for `name`/`error`, otherwise non-null): `name`, `status`, `caller_prompt`, `duration_ms`, `error`, `transcript`, `tool_calls`, `warnings`, `audio_actions`. Always-present keys with nullable value (when the underlying analysis did not run): `latency`, `component_latency`, `call_metadata`, `emotion` (requires `prosody: true`). Conditional key (absent unless requested): `debug` (only when `--verbose`). Branch on null before reading nested fields.\n\nTrigger `--verbose` when:\n- transcript accuracy looks wrong and you need `platform_transcript` to compare against Vent\'s STT\n- latency looks bad and you need per-turn or component-level breakdowns\n- interruption / barge-in behavior looks wrong and you need per-turn debug fields\n- tool-call execution looks inconsistent or missing and you need the raw observed timeline\n- the provider returned warnings or errors and you need provider-native artifacts in `debug.provider_metadata`\n\nSkip `--verbose` for normal pass/fail iteration \u2014 it adds noise to stdout for no benefit.\n\nIgnore minor STT mis-transcriptions in `transcript` text (e.g. `"check teach hat"` for `"check that"`, swapped homophones, missing question marks on short tails). These are streaming-STT artifacts, not agent bugs. Judge on semantic intent, not exact spelling. Only flag transcript quality when it prevents understanding what the agent actually said.\n\n`audio_actions` lists which turns had injected interrupts; check the agent\'s reply at the next turn to judge whether it acknowledged the barge-in or restarted from scratch. Overtalk needs the recording and is not evaluable from transcript text alone.\n\nFor transfers, `call_metadata.transfer_attempted` (provider claimed it tried) and `call_metadata.transfer_completed` (mechanically verified by Vent) can disagree \u2014 report both. Use `call_metadata.transfers[]` for destination, type, and per-attempt status.\n\n## Reporting Results\n\nBefore reporting, read the agent\'s code to locate where the observed behavior originates. If the issue is small and you can fix it, fix it and explain what you did \u2014 don\'t ask permission first.\n\nThen write whatever shape of report fits the call. No mandated structure. The report is for a voice-agent developer who wants to know: did my change work, and if not, what do I do next? Adapt the depth to the call \u2014 a clean pass with nothing to report needs little; a regression with a multi-layer cause needs more. Use a transcript excerpt when it helps the user see what actually happened.\n\nHard rules \u2014 these are about not leaking Vent\'s internals into a user-facing report:\n\n- Translate raw numbers into plain English. Users do not know what "p95 850ms" means; say "snappy throughout" or "noticeably sluggish, around 1.6 seconds with the LLM as the bottleneck."\n- Always include the recording from `call_metadata.recording_url` as an inline `[Recording](url)` link, placed in **one block at the very end of the report** \u2014 never sprinkled through the prose. For a single call, it\'s one link as the last line. For multi-call reports, the bottom block lists one link per call labeled by name (e.g. `reschedule-appointment: [Recording](url)`). Never paste a bare URL.\n- Mission success is your judgment, not Vent\'s. The `status` field and exit code only indicate whether the call ran end-to-end through the pipeline. Don\'t parrot a Vent "pass" when the transcript shows the agent failed the task, and don\'t flag a Vent "fail" as a real bug when it was a pipeline blip.\n- Don\'t surface `warnings[]` (infrastructure noise). Don\'t surface Vent-side artifacts (caller wait modes, harness timing, internal pipeline quirks). Don\'t include `cost_usd` unless the user asks.\n- Surface only what the user can act on in their own agent\'s code or config.\n\nFor multi-call runs, lead with a short summary stating *your* judgment of what happened across the calls (e.g. "3 of 4 did what they were supposed to; `cancel-appointment` never actually canceled"), not a parroted Vent pass/fail count. Then cover each call with whatever depth it needs.\n';
5908
5908
 
5909
5909
  // src/skills/codex.md
5910
- var codex_default = '# Vent \u2014 Voice Agent Calls\n\nCall voice agents from the terminal. Calls run in the cloud (30-120s).\n\n## When to Call\n\n- After modifying voice agent code (system prompt, tools, handlers): ALWAYS run calls\n- After changing audio config: run calls\n- Before marking a task complete that touches agent behavior: run calls\n\n## LiveKit Agent Setup (REQUIRED before running LiveKit tests)\n\nWhen the user asks to test, configure, or run anything with a LiveKit agent, you MUST install the Vent LiveKit helper in the agent project and add the instrumentation call to the agent entrypoint BEFORE running any Vent calls.\n- Node.js: `npm install @vent-hq/livekit`, then add `instrumentLiveKitAgent({ ctx, session })` to the agent entrypoint.\n- Python: `pip install vent-livekit`, then add `instrument_livekit_agent(ctx=ctx, session=session)` to the agent entrypoint.\nWithout this, Vent cannot capture metrics, tool calls, usage, or session reports from the LiveKit agent.\n\nBefore running a LiveKit test, verify the agent process is running (`ps aux | grep <agent_process>`). If it\'s not running, start it before attempting any calls.\n\nLiveKit worker restart caveat: After restarting a LiveKit agent process, wait at least 60 seconds before running a Vent call. If calls were attempted against stale workers (failed with "agent did not speak" or "no speech detected"), kill the agent, wait 60 seconds for ALL stale workers to deregister, then restart. Do not restart the agent multiple times in quick succession \u2014 each restart creates another stale worker registration that compounds the problem.\n\n## Setup (if no Vent access token)\n\nIf `~/.vent/credentials` does not exist and `VENT_ACCESS_TOKEN` is not set:\n\n```bash\nnpx vent-hq init\n```\n\nOne command. No email, no browser, no human interaction needed. You get 10 free runs instantly.\n\n## Commands\n\n| Command | Purpose |\n|---------|---------|\n| `npx vent-hq init` | First-time setup (creates account + installs skills) |\n| `npx vent-hq agent start -f .vent/suite.<adapter>.json` | Start one shared local agent session (required for `start_command`) |\n| `npx vent-hq agent stop <session-id>` | Close a shared local agent session |\n| `npx vent-hq run -f .vent/suite.<adapter>.json` | Run a call from suite file (auto-selects if only one call) |\n| `npx vent-hq run -f .vent/suite.<adapter>.json --verbose` | Include debug fields in the result JSON |\n| `npx vent-hq run -f .vent/suite.<adapter>.json --call <name>` | Run a specific named call |\n| `npx vent-hq stop <run-id>` | Cancel a queued or running call |\n| `npx vent-hq status <run-id>` | Get full results for a completed run |\n| `npx vent-hq status <run-id> --verbose` | Re-print a run with debug fields included |\n\n## When To Use `--verbose`\n\nDefault output is enough for most iterations. It already includes:\n- transcript\n- latency\n- audio analysis\n- tool calls\n- summary cost / recording / transfers\n\nUse `--verbose` only when you need debugging detail that is not in the default result:\n- per-turn debug fields: timestamps, caller decision mode, silence pad, STT confidence, platform transcript\n- raw signal analysis: `debug.signal_quality`\n- harness timings: `debug.harness_overhead`\n- raw prosody payload and warnings\n- raw provider warnings\n- per-turn component latency arrays\n- raw observed tool-call timeline\n- provider-specific metadata in `debug.provider_metadata`\n\nTrigger `--verbose` when:\n- transcript accuracy looks wrong and you need to inspect `platform_transcript`\n- latency is bad and you need per-turn/component breakdowns\n- interruptions/barge-in behavior looks wrong\n- tool-call execution looks inconsistent or missing\n- the provider returned warnings/errors or you need provider-native artifacts\n\nSkip `--verbose` when:\n- you only need pass/fail, transcript, latency, tool calls, recording, or summary\n- you are doing quick iteration on prompt wording and the normal result already explains the failure\n\n## Normalization Contract\n\nVent always returns one normalized result shape on `stdout` across adapters. Treat these as the stable categories:\n- `transcript`\n- `latency`\n- `tool_calls`\n- `component_latency`\n- `call_metadata`\n- `warnings`\n- `audio_actions`\n- `emotion`\n\nSource-of-truth policy:\n- Vent computes transcript, latency, and audio-quality metrics itself.\n- Hosted adapters choose the best source per category, usually provider post-call data for tool calls, call metadata, transfers, provider transcripts, and recordings.\n- Realtime provider events are fallback or enrichment only when post-call data is missing, delayed, weaker for that category, or provider-specific.\n- `LiveKit` helper events are the provider-native path for rich in-agent observability.\n- `websocket`/custom agents are realtime-native but still map into the same normalized categories.\n- Keep adapter-specific details in `call_metadata.provider_metadata` or `debug.provider_metadata`, not in new top-level fields.\n\n## Workflow\n\n1. Read the voice agent\'s codebase \u2014 understand its system prompt, tools, intents, and domain.\n2. Read the config schema below for all available fields.\n3. Create the suite file in `.vent/` using the naming convention: `.vent/suite.<adapter>.json` (e.g., `.vent/suite.vapi.json`, `.vent/suite.websocket.json`, `.vent/suite.retell.json`). This prevents confusion when multiple adapters are tested in the same project.\n4. Run calls:\n ```\n # suite with one call (auto-selects)\n npx vent-hq run -f .vent/suite.<adapter>.json\n\n # suite with multiple calls \u2014 pick one by name\n npx vent-hq run -f .vent/suite.<adapter>.json --call happy-path\n\n # local start_command \u2014 first start relay, then add --session\n npx vent-hq agent start -f .vent/suite.<adapter>.json\n npx vent-hq run -f .vent/suite.<adapter>.json --call happy-path --session <session-id>\n ```\n5. To run multiple calls, **run each as a separate shell command** \u2014 Codex executes parallel shell calls concurrently, so each call runs simultaneously. Example: emit one `npx vent-hq run --call happy-path` and one `npx vent-hq run --call edge-case` as separate tool calls.\n6. After results return, **compare with previous run** \u2014 Vent saves full result JSON to `.vent/runs/` after every run. Shape: `{ run_id, timestamp, git_sha, summary, call_results: [...] }`. Each entry in `call_results` is a flat normalized per-call result: `{ name, status, duration_ms, transcript, observed_tool_calls, metrics, cost_usd, ... }`. Compare: `call_results[i].status` flips, `call_results[i].metrics.latency_p50_ms` / `latency_p95_ms` changes >20%, `call_results[i].observed_tool_calls[].successful` count drops, `summary.total_cost_usd` increases >30%. Correlate with `git diff` between the two runs\' `git_sha` values. Use `--verbose` only when the default result is not enough to explain the failure. Skip if no previous run exists.\n7. After code changes, re-run the same way.\n\n### Multiple suite files\n\nIf `.vent/` contains more than one suite file, **always check which adapter each suite uses before running**. Read the `connection.adapter` field in each file. Never run a suite intended for a different adapter \u2014 results will be meaningless or fail. When reporting results, always state which suite file produced them (e.g., "Results from `.vent/suite.vapi.json`:").\n\n## Critical Rules\n\n1. **Run all calls in parallel as separate shell commands** \u2014 When a suite has multiple calls, emit each `npx vent-hq run` as its own shell tool call in the same response. Codex runs shell calls concurrently \u2014 they execute simultaneously. Set a 300-second (5 min) timeout on each. Do NOT combine calls into one command with `&`.\n2. **Wait for all results** \u2014 Do not end your response until every call has returned results.\n3. **Output format** \u2014 In non-TTY mode (when run by an agent), every SSE event is written to stdout as a JSON line. Results are always in stdout.\n4. **This skill is self-contained** \u2014 The full config schema is below.\n\n## WebSocket Protocol (BYO agents)\n\nWhen using `adapter: "websocket"`, Vent communicates with the agent over a single WebSocket connection:\n\n- **Binary frames** \u2192 PCM audio (16-bit mono, configurable sample rate)\n- **Text frames** \u2192 optional JSON events the agent can send for better test accuracy:\n\n| Event | Format | Purpose |\n|-------|--------|---------|\n| `speech-update` | `{"type":"speech-update","status":"started"\\|"stopped"}` | Enables platform-assisted turn detection (more accurate than VAD alone) |\n| `tool_call` | `{"type":"tool_call","name":"...","arguments":{...},"result":...,"successful":bool,"duration_ms":number}` | Reports tool calls for observability |\n| `vent:timing` | `{"type":"vent:timing","stt_ms":number,"llm_ms":number,"tts_ms":number}` | Reports component latency breakdown per turn |\n| `vent:session` | `{"type":"vent:session","platform":"custom","provider_call_id":"...","provider_session_id":"..."}` | Reports stable provider/session identifiers |\n| `vent:call-metadata` | `{"type":"vent:call-metadata","call_metadata":{...}}` | Reports post-call metadata such as cost, recordings, variables, and provider-specific artifacts |\n| `vent:transcript` | `{"type":"vent:transcript","role":"caller"\\|"agent","text":"...","turn_index":0}` | Reports platform/native transcript text for caller or agent |\n| `vent:transfer` | `{"type":"vent:transfer","destination":"...","status":"attempted"\\|"completed"}` | Reports transfer attempts and outcomes |\n| `vent:debug-url` | `{"type":"vent:debug-url","label":"log","url":"https://..."}` | Reports provider debug/deep-link URLs |\n| `vent:warning` | `{"type":"vent:warning","message":"...","code":"..."}` | Reports provider/runtime warnings worth preserving in run metadata |\n\nVent sends `{"type":"end-call"}` to the agent when the test is done.\n\nAll text frames are optional \u2014 audio-only agents work fine with VAD-based turn detection.\n\n## Full Config Schema\n\n- ALL calls MUST reference the agent\'s real context (system prompt, tools, knowledge base) from the codebase.\n\n<vent_run>\n{\n "connection": { ... },\n "calls": {\n "happy-path": { ... },\n "edge-case": { ... }\n }\n}\n</vent_run>\n\nOne suite file per platform/adapter. `connection` is declared once, `calls` is a named map of call specs. Each key becomes the call name. Run one call at a time with `--call <name>`.\n\n<config_connection>\n{\n "connection": {\n "adapter": "required -- websocket | livekit | vapi | retell | elevenlabs | bland",\n "start_command": "shell command to start agent (relay only, required for local)",\n "health_endpoint": "health check path after start_command (default: /health, relay only, required for local)",\n "agent_url": "hosted custom agent URL (wss:// or https://). Use for BYO hosted agents.",\n "agent_port": "local agent port (default: 3001, required for local)",\n "platform": "optional authoring convenience for platform-direct adapters only. The CLI resolves this locally, creates/updates a saved platform connection, and strips raw provider secrets before submit. Do not use for websocket start_command or agent_url runs."\n }\n}\n\n<credential_resolution>\nIMPORTANT: How to handle platform credentials (API keys, secrets, agent IDs):\n\nThere are two product modes:\n- `BYO agent runtime`: your agent owns its own provider credentials. This covers both `start_command` (local) and `agent_url` (hosted custom endpoint).\n- `Platform-direct runtime`: Vent talks to `vapi`, `retell`, `elevenlabs`, `bland`, or `livekit` directly. This is the only mode that uses saved platform connections.\n\n1. For `start_command` and `agent_url` runs, do NOT put Deepgram / ElevenLabs / OpenAI / other provider keys into Vent config unless the Vent adapter itself needs them. Those credentials belong to the user\'s local or hosted agent runtime.\n2. For platform-direct adapters (`vapi`, `retell`, `elevenlabs`, `bland`, `livekit`), the CLI auto-resolves credentials from `.env.local`, `.env`, and the current shell env. If those env vars already exist, you can omit credential fields from the config JSON entirely.\n3. If you include credential fields in the config, put the ACTUAL VALUE, NOT the env var name. WRONG: `"vapi_api_key": "VAPI_API_KEY"`. RIGHT: `"vapi_api_key": "sk-abc123..."` or omit the field.\n4. The CLI uses the resolved provider config to create or update a saved platform connection server-side, then submits only `platform_connection_id`. Users should not manually author `platform_connection_id`.\n5. To check whether credentials are already available, inspect `.env.local`, `.env`, and any relevant shell env visible to the CLI process.\n6. **IMPORTANT: `npx vent-hq` commands auto-load `.env` files \u2014 never use `source .env && export` before running them.** Only your own custom scripts (e.g. `npx tsx my-script.ts`) need manual env loading. To add a new credential, just append it to `.env` and the CLI picks it up automatically on the next run.\n\nAuto-resolved env vars per platform:\n| Platform | Config field | Env var (auto-resolved from `.env.local`, `.env`, or shell env) |\n|----------|-------------|-----------------------------------|\n| Vapi | vapi_api_key | VAPI_API_KEY |\n| Vapi | vapi_assistant_id | VAPI_ASSISTANT_ID or VAPI_AGENT_ID |\n| Bland | bland_api_key | BLAND_API_KEY |\n| Bland | bland_pathway_id | BLAND_PATHWAY_ID |\n| Bland | persona_id | BLAND_PERSONA_ID |\n| LiveKit | livekit_api_key | LIVEKIT_API_KEY |\n| LiveKit | livekit_api_secret | LIVEKIT_API_SECRET |\n| LiveKit | livekit_url | LIVEKIT_URL |\n| Retell | retell_api_key | RETELL_API_KEY |\n| Retell | retell_agent_id | RETELL_AGENT_ID |\n| ElevenLabs | elevenlabs_api_key | ELEVENLABS_API_KEY |\n| ElevenLabs | elevenlabs_agent_id | ELEVENLABS_AGENT_ID |\n\nThe CLI strips raw platform secrets before `/runs/submit`. Platform-direct runs go through a saved `platform_connection_id` automatically. BYO agent runs (`start_command` and `agent_url`) do not.\n</credential_resolution>\n\n<config_adapter_rules>\nWebSocket (local agent via relay):\n{\n "connection": {\n "adapter": "websocket",\n "start_command": "npm run start",\n "health_endpoint": "/health",\n "agent_port": 3001\n }\n}\n\nWebSocket (hosted custom agent):\n{\n "connection": {\n "adapter": "websocket",\n "agent_url": "https://my-agent.fly.dev"\n }\n}\n\nRetell:\n{\n "connection": {\n "adapter": "retell",\n "platform": { "provider": "retell" }\n }\n}\nCredentials auto-resolve from `.env.local`, `.env`, or shell env: RETELL_API_KEY, RETELL_AGENT_ID. Only add retell_api_key/retell_agent_id to the JSON if those env vars are not already available.\nmax_concurrency for Retell: Pay-as-you-go includes 20 concurrent calls, with more available on demand; Enterprise has no cap. Ask the user which plan they\'re on. If unknown, default to 20.\n\nBland:\n{\n "connection": {\n "adapter": "bland",\n "platform": { "provider": "bland" }\n }\n}\nCredentials auto-resolve from `.env.local`, `.env`, or shell env: BLAND_API_KEY, BLAND_PATHWAY_ID, BLAND_PERSONA_ID. Only add bland_api_key/bland_pathway_id/persona_id to the JSON if those env vars are not already available.\nmax_concurrency for Bland: Start=10, Build=50, Scale=100, Enterprise=unlimited. Ask the user which plan they\'re on. If unknown, default to 10.\nNote: All agent config (voice, model, tools, etc.) is set on the pathway itself, not in Vent config.\n\nVapi:\n{\n "connection": {\n "adapter": "vapi",\n "platform": { "provider": "vapi" }\n }\n}\nCredentials auto-resolve from `.env.local`, `.env`, or shell env: VAPI_API_KEY, VAPI_ASSISTANT_ID (or VAPI_AGENT_ID). Only add vapi_api_key/vapi_assistant_id to the JSON if those env vars are not already available.\nmax_concurrency for Vapi: every account includes 10 concurrent call slots by default; self-serve accounts can buy extra reserved lines, and Enterprise includes unlimited concurrency. Set this to the user\'s purchased limit. If unknown, default to 10.\nAll assistant config (voice, model, transcriber, interruption settings, etc.) is set on the Vapi assistant itself, not in Vent config.\n\nElevenLabs:\n{\n "connection": {\n "adapter": "elevenlabs",\n "platform": { "provider": "elevenlabs" }\n }\n}\nCredentials auto-resolve from `.env.local`, `.env`, or shell env: ELEVENLABS_API_KEY, ELEVENLABS_AGENT_ID. Only add elevenlabs_api_key/elevenlabs_agent_id to the JSON if those env vars are not already available.\nmax_concurrency for ElevenLabs: Free=4, Starter=6, Creator=10, Pro=20, Scale=30, Business=30. Burst pricing can temporarily allow up to 3x the base limit. Ask the user which plan they\'re on and whether burst is enabled. If unknown, default to 4.\n\nLiveKit:\n{\n "connection": {\n "adapter": "livekit",\n "platform": {\n "provider": "livekit",\n "livekit_agent_name": "my-agent",\n "max_concurrency": 5\n }\n }\n}\nCredentials auto-resolve from `.env.local`, `.env`, or shell env: LIVEKIT_API_KEY, LIVEKIT_API_SECRET, LIVEKIT_URL. Only add these to the JSON if those env vars are not already available.\nlivekit_agent_name is optional -- only needed if your LiveKit agent registered with an explicit dispatch name in the SDK, e.g. Python `@server.rtc_session(agent_name="\u2026")` or `WorkerOptions(agent_name="\u2026")`, Node.js `new ServerOptions({ agentName: "\u2026" })`. Omit for automatic dispatch.\nThe livekit adapter requires the LiveKit Agents SDK. It depends on Agents SDK signals (lk.agent.state, lk.transcription) for readiness detection, turn timing, and component latency. Custom LiveKit participants not using the Agents SDK should use the websocket adapter with a relay instead.\nmax_concurrency for LiveKit Cloud: Build=5, Ship=20, Scale=50 managed inference sessions. Agent session concurrency can be higher (Build=5, Ship=20, Scale up to 600), but managed inference is the usual gating limit for voice agents. Ask the user which tier they\'re on. If unknown, default to 5.\nKnow the provider/account concurrency limits and use them in planning, but Vent does not enforce provider caps at runtime. Hosted worker throughput is an infra setting: `WORKER_TOTAL_CONCURRENCY` caps one worker Machine.\n</config_adapter_rules>\n</config_connection>\n\n\n<call_config>\n<tool_call_capture>\nvapi/retell/elevenlabs/bland: automatic via platform API (no user code needed).\nWebSocket: user\'s agent must emit a JSON text frame per tool call: {"type":"tool_call","name":"...","arguments":{},"result":{},"successful":true,"duration_ms":150}\nLiveKit: use the `@vent-hq/livekit` (Node) or `vent-livekit` (Python) helper. See the "LiveKit Agent Setup" section. The helper captures tool calls automatically from Agents SDK session events \u2014 do not publish on Vent topics manually.\n</tool_call_capture>\n\n<component_timing>\nPlatform adapters (vapi/retell/elevenlabs/bland/livekit) get STT/LLM/TTS breakdown automatically.\nWebSocket agents can opt in by sending a JSON text frame after each agent turn:\n {"type":"vent:timing","stt_ms":120,"llm_ms":450,"tts_ms":80}\nAll fields optional. Send one per agent response. Without this, component_latency is omitted from results.\nWhen modifying a WebSocket agent\'s code, add this text frame after TTS completes to enable component latency reporting.\n</component_timing>\n\n<metadata_capture>\nWebSocket agents can emit richer observability metadata as JSON text frames:\n {"type":"vent:session","platform":"custom","provider_call_id":"call_123","provider_session_id":"session_abc"}\n {"type":"vent:call-metadata","call_metadata":{"recording_url":"https://...","cost_usd":0.12,"provider_debug_urls":{"log":"https://..."}}}\n {"type":"vent:debug-url","label":"trace","url":"https://..."}\n {"type":"vent:session-report","report":{"room_name":"room-123","events":[...],"metrics":[...]}}\n {"type":"vent:transcript","role":"caller","text":"I need to reschedule","turn_index":0}\n\n`vent:session-report` in the docs is not a blanket instruction for LiveKit agents. In LiveKit mode, only publish what the helper explicitly supports \u2014 hand-rolling a report from `ctx.addShutdownCallback` runs after `room.disconnect()` and fails with "engine is closed".\n\nLiveKit agents get all metadata through the `@vent-hq/livekit` (Node) / `vent-livekit` (Python) helper \u2014 it subscribes to Agents SDK session events (`metrics_collected`, `function_tools_executed`, `conversation_item_added`, `session_usage_updated`, close) and publishes on Vent topics automatically. Transcript and agent-state timing come from native LiveKit room signals (`lk.transcription`, `lk.agent.state`) \u2014 the helper does not duplicate them.\n\nNode.js \u2014 `npm install @vent-hq/livekit`:\n```ts\nimport { instrumentLiveKitAgent } from "@vent-hq/livekit";\n\nconst vent = instrumentLiveKitAgent({ ctx, session });\n```\nPython \u2014 `pip install vent-livekit`:\n```python\nfrom vent_livekit import instrument_livekit_agent\n\nvent = instrument_livekit_agent(ctx=ctx, session=session)\n```\n\nThe helper is the only supported integration path for LiveKit Agents SDK agents. Do not publish on `vent:*` topics manually \u2014 let the helper forward SDK events.\n</metadata_capture>\n\n<config_call>\nEach call in the `calls` map. The key is the call name (e.g. `"reschedule-appointment"`, not `"call-1"`).\n{\n "caller_prompt": "required \u2014 caller persona and behavior (name -> goal -> emotion -> conditional behavior)",\n "max_turns": "required \u2014 default 6",\n "silence_threshold_ms": "optional \u2014 end-of-turn threshold ms (default 800, 200-10000). 800-1200 FAQ, 2000-3000 tool calls, 3000-5000 complex reasoning.",\n "persona": "optional \u2014 caller behavior controls",\n {\n "pace": "slow | normal | fast",\n "clarity": "clear | vague | rambling",\n "disfluencies": "true | false",\n "cooperation": "cooperative | reluctant | hostile",\n "emotion": "neutral | cheerful | confused | frustrated | skeptical | rushed",\n "interruption_style": "optional preplanned interrupt tendency: low | high. If set, Vent may pre-plan a caller cut-in before the agent turn starts. It does NOT make a mid-turn interrupt LLM call.",\n "memory": "reliable | unreliable",\n "intent_clarity": "clear | indirect | vague",\n "confirmation_style": "explicit | vague"\n },\n "audio_actions": "optional \u2014 per-turn audio stress calls",\n [\n { "action": "interrupt", "at_turn": "N", "prompt": "what caller says" },\n { "action": "inject_noise", "at_turn": "N", "noise_type": "babble | white | pink", "snr_db": "0-40" },\n { "action": "split_sentence", "at_turn": "N", "split": { "part_a": "...", "part_b": "...", "pause_ms": "500-5000" } },\n { "action": "noise_on_caller", "at_turn": "N" }\n ],\n "prosody": "optional \u2014 Hume emotion analysis (default false)",\n "caller_audio": "optional \u2014 omit for clean audio",\n {\n "noise": { "type": "babble | white | pink", "snr_db": "0-40" },\n "speed": "0.5-2.0 (1.0 = normal)",\n "speakerphone": "true | false",\n "mic_distance": "close | normal | far",\n "clarity": "0.0-1.0 (1.0 = perfect)",\n "accent": "american | british | australian | filipino | spanish_mexican | spanish_peninsular | spanish_colombian | spanish_argentine | german | french | italian | dutch | japanese",\n "packet_loss": "0.0-0.3",\n "jitter_ms": "0-100"\n },\n "language": "optional \u2014 ISO 639-1: en, es, fr, de, it, nl, ja"\n}\n\nInterruption rules:\n- `audio_actions: [{ "action": "interrupt", ... }]` is the deterministic per-turn interrupt test. Prefer this for evaluation.\n- `persona.interruption_style` is only a preplanned caller tendency. If used, Vent decides before the agent response starts whether this turn may cut in.\n- Vent no longer pauses mid-turn to ask a second LLM whether to interrupt.\n- For production-faithful testing, prefer explicit `audio_actions.interrupt` over persona interruption.\n\n<examples_call>\n<simple_suite_example>\n{\n "connection": {\n "adapter": "vapi",\n "platform": { "provider": "vapi" }\n },\n "calls": {\n "reschedule-appointment": {\n "caller_prompt": "You are Maria, calling to reschedule her dentist appointment from Thursday to next Tuesday. She\'s in a hurry and wants this done quickly.",\n "max_turns": 8\n },\n "cancel-appointment": {\n "caller_prompt": "You are Tom, calling to cancel his appointment for Friday. He\'s calm and just wants confirmation.",\n "max_turns": 6\n }\n }\n}\n</simple_suite_example>\n\n<advanced_call_example>\nA call entry with advanced options (persona, audio actions, prosody):\n{\n "noisy-interruption-booking": {\n "caller_prompt": "You are James, an impatient customer calling from a loud coffee shop to book a plumber for tomorrow morning. You interrupt the agent mid-sentence when they start listing availability \u2014 you just want the earliest slot.",\n "max_turns": 12,\n "persona": { "pace": "fast", "cooperation": "reluctant", "emotion": "rushed", "interruption_style": "high" },\n "audio_actions": [\n { "action": "interrupt", "at_turn": 3, "prompt": "Just give me the earliest one!" },\n { "action": "inject_noise", "at_turn": 1, "noise_type": "babble", "snr_db": 15 }\n ],\n "caller_audio": { "noise": { "type": "babble", "snr_db": 20 }, "speed": 1.3 },\n "prosody": true\n }\n}\n</advanced_call_example>\n\n</examples_call>\n</config_call>\n\n<output_conversation_test>\n{\n "name": "sarah-hotel-booking",\n "status": "completed",\n "caller_prompt": "You are Sarah, calling to book...",\n "duration_ms": 45200,\n "error": null,\n "transcript": [\n { "role": "caller", "text": "Hi, I\'d like to book..." },\n { "role": "agent", "text": "Sure! What date?", "ttfb_ms": 650, "ttfw_ms": 780, "audio_duration_ms": 2400 },\n { "role": "agent", "text": "Let me check availability.", "ttfb_ms": 540, "ttfw_ms": 620, "audio_duration_ms": 1400 },\n { "role": "caller", "text": "Just the earliest slot please", "audio_duration_ms": 900 },\n { "role": "agent", "text": "Sure, the earliest is 9 AM tomorrow.", "ttfb_ms": 220, "ttfw_ms": 260, "audio_duration_ms": 2100 }\n ],\n "latency": {\n "response_time_ms": 890, "response_time_source": "ttfw",\n "p50_response_time_ms": 850, "p90_response_time_ms": 1100, "p95_response_time_ms": 1400, "p99_response_time_ms": 1550,\n "first_response_time_ms": 1950,\n "mean_ttfw_ms": 890, "p50_ttfw_ms": 850, "p95_ttfw_ms": 1400, "p99_ttfw_ms": 1550,\n "first_turn_ttfw_ms": 1950,\n "drift_slope_ms_per_turn": -45.2, "mean_silence_pad_ms": 128, "mouth_to_ear_est_ms": 1020\n },\n "tool_calls": {\n "total": 2, "successful": 2, "failed": 0, "mean_latency_ms": 340,\n "names": ["check_availability", "book_appointment"],\n "observed": [{ "name": "check_availability", "arguments": { "date": "2026-03-12" }, "result": { "slots": ["09:00", "10:00"] }, "successful": true, "latency_ms": 280, "turn_index": 3 }]\n },\n "component_latency": {\n "mean_stt_ms": 120, "mean_llm_ms": 450, "mean_tts_ms": 80,\n "p95_stt_ms": 180, "p95_llm_ms": 620, "p95_tts_ms": 110,\n "mean_speech_duration_ms": 2100,\n "bottleneck": "llm"\n },\n "call_metadata": {\n "platform": "vapi",\n "cost_usd": 0.08,\n "recording_url": "https://example.com/recording",\n "ended_reason": "customer_ended_call",\n "transfers": []\n },\n "warnings": [],\n "audio_actions": [],\n "emotion": {\n "naturalness": 0.72, "mean_calmness": 0.65, "mean_confidence": 0.58, "peak_frustration": 0.08, "emotion_trajectory": "stable"\n }\n}\n\nAlways present: name, status, caller_prompt, duration_ms, error, transcript, tool_calls, warnings, audio_actions. Nullable when analysis didn\'t run: latency, component_latency, call_metadata, emotion (requires prosody: true), debug (requires --verbose).\n\n### Result presentation\n\nWhen you report a conversation result to the user, always include:\n\n1. **Summary** \u2014 the overall verdict and the 1-3 most important findings.\n2. **Transcript summary** \u2014 a short narrative of what happened in the call.\n3. **Recording URL** \u2014 include `call_metadata.recording_url` when present; explicitly say when it is unavailable.\n4. **Next steps** \u2014 concrete fixes, follow-up tests, or why no change is needed.\n\nUse metrics to support the summary, not as the whole answer. Do not dump raw numbers without interpretation.\n\nWhen `call_metadata.transfer_attempted` is present, explicitly say whether the transfer only appeared attempted or was mechanically verified as completed (`call_metadata.transfer_completed`). Use `call_metadata.transfers[]` to report transfer type, destination, status, and sources.\n\n### Judging guidance\n\nUse the transcript, metrics, test scenario, and relevant agent instructions/system prompt to judge:\n\n| Dimension | What to check |\n|--------|----------------|\n| **Hallucination detection** | Check whether the agent stated anything not grounded in its instructions, tools, or the conversation itself. |\n| **Instruction following** | Compare the agent\'s behavior against its system prompt and the test\'s expected constraints. |\n| **Context retention** | Check whether the agent forgot or contradicted information established earlier in the call. |\n| **Semantic accuracy** | Check whether the agent correctly understood the caller\'s intent and responded to the real request. |\n| **Goal completion** | Decide whether the agent achieved what the test scenario was designed to verify. |\n| **Transfer correctness** | For transfer scenarios, judge whether transfer was appropriate, whether it completed, whether it went to the expected destination, and whether enough context was passed during the handoff. |\n\nIgnore minor STT mis-transcriptions in `transcript` text (e.g. `"check teach hat"` for `"check that"`, swapped homophones, missing question marks on short tails). These are streaming-STT artifacts, not agent bugs. Judge on semantic intent, not exact spelling. Only flag transcript quality when it prevents understanding what the agent actually said.\n\n### Interruption evaluation\n\nEvaluate interruption handling by reading the transcript and listening to the recording. Flag any turn where the agent ignores a barge-in, repeats itself from scratch, or loses context after being cut off.\n\n| Dimension | How to evaluate |\n|--------|----------------|\n| **Recovery** | After a caller cuts in, does the agent\'s next reply acknowledge or address the barge-in rather than restarting from scratch? |\n| **Context retention** | After the interruption, does the agent remember pre-interrupt conversation state? |\n| **Overtalk** | Does the agent keep speaking for long after the caller starts, or does it yield promptly? Use the recording to judge. |\n</output_conversation_test>\n</call_config>\n\n\n## Exit Codes\n\n0=pass, 1=fail, 2=error\n';
5910
+ var codex_default = '# Vent - Voice Agent Calls\n\nUse Vent to verify real voice-agent behavior from the terminal.\n\n## When to Test\n\nVent is the voice CI gate for the agent. A non-trivial change that touches voice behavior is not "done" until the relevant Vent call has been run and you\'ve judged from its output that the change works. Auto-trigger a call after finishing the change \u2014 do not wait to be asked.\n\nNon-trivial means anything that can change what the caller hears, how the agent responds, what tools are called, how the call routes, or how audio timing behaves: prompt rewrites or sections (not a one-word tweak), tool schema or handler changes, state/memory logic, transfer logic, interruption handling, provider/audio config, platform integration code, and any bug fix for voice-agent behavior. One-line copy or comment edits do not need a call.\n\nBefore launching, check `.vent/runs/` for unresolved issues from prior runs (failing status, regressions, warnings, latency jumps). If any are still present and within the scope of the current task, fix them as part of this change, then run the call to verify both the new fix and the prior issues are resolved. Report unresolved prior issues that are out of scope rather than silently leaving them.\n\nDo not declare the task complete until you\'ve reviewed the call results \u2014 transcript, tool calls, observed behavior \u2014 and confirmed the change actually does what was intended. Vent\'s `status` and exit code only tell you whether the call ran end-to-end without a pipeline error; mission success is your judgment.\n\n## Autonomous Iteration\n\nYou can run a fix \u2192 Vent-call \u2192 fix \u2192 Vent-call loop within a single user request, doing many iterations before returning. This is genuinely useful for mechanical issues where the fix is high-confidence but the verification is slow.\n\nDefault: don\'t iterate autonomously. Run the relevant call after your change, surface findings, and let the user decide what\'s next \u2014 they stay in control of cost and scope.\n\nTrigger autonomous iteration when either:\n- The user explicitly asks for it ("iterate until it works", "keep going", "autonomous mode", "fix everything").\n- You judge it\'s clearly worth it: the change is well-scoped, the failure mode is mechanical (tool schema, registry, prompt phrasing), and you have a concrete plan for the next attempt. If you\'d be guessing at the next attempt, stop and ask.\n\nCap iterations at ~5 unless the user gave a higher bound. If the same fix attempt fails twice, or the failure mode keeps shifting between attempts, stop and report \u2014 you\'re thrashing.\n\nThe user may not know autonomous iteration is on the table. When you suspect a likely-multi-cycle issue, offer it once before starting solo (e.g. "I can iterate on this autonomously, otherwise I\'ll stop here for your review").\n\n## Codex Execution\n\nCodex can run shell tool calls concurrently, so run multiple Vent calls as separate shell commands in parallel. Do not combine them with `&`.\n\nFor multiple calls from one suite, run each named call with `--call <name>` as its own parallel shell tool call:\n\n```bash\nnpx vent-hq run -f .vent/suite.vapi.json --call happy-path\nnpx vent-hq run -f .vent/suite.vapi.json --call tool-path\n```\n\nUse a 5-minute shell-tool timeout (`300000` ms) on Vent run commands so normal calls are not killed by the default 2-minute Bash timeout. This is not backgrounding; wait for stdout/results before ending your response. Use the JSON returned by `npx vent-hq run` directly; do not call `vent status` unless checking an older run.\n\n## Workflow\n\n1. Identify the behavior under test. Read enough of the agent codebase to understand its system prompt, tools, handlers, routes, provider config, platform wiring, and expected handoffs.\n2. Reuse an existing `.vent/suite.<adapter>.json` when possible. If `.vent/` contains multiple suites, inspect `connection.adapter` and report which suite file produced the result.\n3. Create or update a suite only when the existing calls do not cover the changed behavior. Name calls after real flows, for example `reschedule-appointment`, not `call-1`.\n4. If the suite uses `start_command`, start one shared local session first with `npx vent-hq agent start -f .vent/suite.<adapter>.json`, then pass `--session <session-id>` to each run.\n5. Pick which call(s) to run based on the change. Fixed bug: replay the failing scenario. Changed tool: include a call that triggers that tool. Prompt or routing change: include the relevant happy path and any important edge path.\n6. Compare against the previous JSON in `.vent/runs/` when validating a fix or regression. Check status flips, latency jumps, tool-call success drops, cost jumps, and transcript divergence. Correlate with `git diff` between saved `git_sha` values when available; skip if no previous run exists.\n\n## Saved Runs\n\nAfter every run, Vent writes the full result JSON to `.vent/runs/`. Shape:\n\n```jsonc\n{\n "run_id": "...",\n "timestamp": "2026-04-21T...Z",\n "git_sha": "...",\n "summary": { "status": "completed", "calls_total": 2, "calls_passed": 2, "calls_failed": 0, "total_duration_ms": 12345, "total_cost_usd": 0.01 },\n "call_results": [\n { "name": "happy-path", "status": "pass", "duration_ms": 6123, "transcript": [], "observed_tool_calls": [], "metrics": { "latency_p50_ms": 420, "latency_p95_ms": 980 }, "cost_usd": 0.004 }\n ]\n}\n```\n\nWhen comparing against a prior run (Workflow step 6), inspect these paths:\n\n- Run-completion status flips: `call_results[i].status` (this only reflects whether the call ran cleanly through the pipeline, not whether the agent accomplished the goal \u2014 judge that from the transcript)\n- Latency: `call_results[i].metrics.latency_p50_ms` or `latency_p95_ms` increased >20%\n- Tool calls: count of `call_results[i].observed_tool_calls[].successful` dropped\n- Cost: `summary.total_cost_usd` or `call_results[i].cost_usd` increased >30%\n- Transcript: `call_results[i].transcript` diverged in semantic content (ignore STT noise)\n\n## Commands\n\n```bash\nnpx vent-hq init # First-time setup: autonomous auth, access token generation, skill install, and minimal starter suite\nnpx vent-hq login # Log in to an existing account and save credentials\nnpx vent-hq run -f .vent/suite.<adapter>.json # Run the only call in a suite, or error if the suite has multiple calls\nnpx vent-hq run -f .vent/suite.<adapter>.json --call happy-path # Run one named call from a multi-call suite\nnpx vent-hq run -f .vent/suite.<adapter>.json --call happy-path --session <session-id> # Run one named call through an existing local relay session\nnpx vent-hq run -f .vent/suite.<adapter>.json --call happy-path --verbose # Run one named call with verbose debug fields\nnpx vent-hq stop <run-id> # Cancel a queued or running run\nnpx vent-hq status <run-id> # Fetch results for a previous run\nnpx vent-hq status <run-id> --verbose # Fetch previous run results with verbose debug fields\nnpx vent-hq agent start -f .vent/suite.<adapter>.json # Start a shared local relay session for suites that use start_command\nnpx vent-hq agent stop <session-id> # Stop a shared local relay session\n```\n\nIf `~/.vent/credentials` is missing and `VENT_ACCESS_TOKEN` is not set, run `npx vent-hq init`. For an existing account, run `npx vent-hq login` or set `VENT_ACCESS_TOKEN`.\n\n## Suite Config\n\nSuites live in `.vent/suite.<adapter>.json`. `connection` is declared once per suite. `calls` is a named map, and each key becomes the call name used with `--call`.\n\nLocal websocket suite:\n\n```json\n{\n "connection": {\n "adapter": "websocket",\n "start_command": "npm run start",\n "health_endpoint": "/health",\n "agent_port": 3001\n },\n "calls": {\n "happy-path": {\n "caller_prompt": "You are Maria calling to reschedule her appointment to next Tuesday.",\n "max_turns": 8,\n "silence_threshold_ms": 1200,\n "audio_actions": [\n { "action": "interrupt", "at_turn": 3, "prompt": "Just give me the earliest one." }\n ]\n }\n }\n}\n```\n\nPlatform-direct suite:\n\n```json\n{\n "connection": {\n "adapter": "vapi",\n "platform": { "provider": "vapi" }\n },\n "calls": {\n "happy-path": {\n "caller_prompt": "You are Maria calling to reschedule her appointment to next Tuesday.",\n "max_turns": 8\n }\n }\n}\n```\n\nWrite `caller_prompt` as a realistic caller with a name, goal, mood, constraints, and conditional behavior. Set `max_turns` based on flow complexity: FAQ `4-6`, booking or tool use `8-12`, complex flows `12-20`.\n\nCall fields:\n\n- `caller_prompt` and `max_turns` are required.\n- `silence_threshold_ms` must be `200-10000`. Common ranges: FAQ `800-1200`, tool calls `2000-3000`, complex reasoning `3000-5000`.\n- `persona` supports `pace`, `clarity`, `disfluencies`, `cooperation`, `emotion`, `interruption_style`, `memory`, `intent_clarity`, and `confirmation_style`.\n- `audio_actions` supports `interrupt`, `inject_noise`, `split_sentence`, and `noise_on_caller`.\n- `caller_audio` supports noise, speed, speakerphone, mic distance, clarity, accent, packet loss, and jitter.\n- `language` is an ISO 639-1 code such as `en`, `es`, `fr`, `de`, `it`, `nl`, or `ja`.\n- `prosody: true` enables emotion analysis and requires Hume access.\n- Prefer explicit `audio_actions.interrupt` over `persona.interruption_style` for deterministic barge-in tests. `persona.interruption_style` is only a preplanned caller tendency.\n\n## Connections and Credentials\n\n### Adapter choice\n\nUse `websocket` for your own local or hosted runtime. Use `start_command` for local agents or `agent_url` for hosted custom endpoints. For `start_command` and `agent_url`, do not put Deepgram, ElevenLabs, OpenAI, or other agent runtime keys into Vent config unless the Vent adapter itself needs them \u2014 the tested agent owns its own runtime credentials.\n\nUse `vapi`, `retell`, `elevenlabs`, `bland`, or `livekit` for platform-direct testing. In this mode Vent itself talks to the provider on the user\'s behalf.\n\nVent provides `DEEPGRAM_API_KEY` and `ANTHROPIC_API_KEY` for its hosted caller/evaluation stack \u2014 those are Vent\'s, not the tested agent\'s.\n\n### Credential resolution\n\nIn platform-direct mode the CLI auto-resolves credentials from `.env.local`, `.env`, and the current shell environment. Do not run `source .env && export` before Vent commands. If you include credential fields in JSON, use the actual value, not the env var name. Do not manually author `platform_connection_id`; the CLI creates or updates the saved platform connection automatically.\n\nAuto-resolved env vars and JSON fields:\n\n- Vapi: `VAPI_API_KEY` -> `vapi_api_key`; `VAPI_ASSISTANT_ID` or `VAPI_AGENT_ID` -> `vapi_assistant_id`\n- Bland: `BLAND_API_KEY` -> `bland_api_key`; `BLAND_PATHWAY_ID` -> `bland_pathway_id`; `BLAND_PERSONA_ID` -> `persona_id`\n- LiveKit: `LIVEKIT_API_KEY` -> `livekit_api_key`; `LIVEKIT_API_SECRET` -> `livekit_api_secret`; `LIVEKIT_URL` -> `livekit_url`\n- Retell: `RETELL_API_KEY` -> `retell_api_key`; `RETELL_AGENT_ID` -> `retell_agent_id`\n- ElevenLabs: `ELEVENLABS_API_KEY` -> `elevenlabs_api_key`; `ELEVENLABS_AGENT_ID` -> `elevenlabs_agent_id`\n\n### Provider config\n\nUse existing provider config when possible: Vapi assistant, Retell agent, ElevenLabs agent, Bland pathway, or LiveKit agent. Bland uniquely supports inline config \u2014 `platform` may use `bland_pathway_id`, `persona_id`, or an inline `task` (with optional voice, model, and turn-handling overrides; see Bland\'s API docs for the full field list).\n\n### Concurrency\n\nWhen you fan out multiple Vent calls in parallel against the same provider (for example, running several named calls from one suite at once), respect the provider\'s per-account concurrency limit. Exceeding it makes calls queue or fail at the provider \u2014 Vent does not enforce these caps for you.\n\nRecord the limit as `max_concurrency` in the suite\'s `platform` block so it\'s visible on future runs. Ask the user which plan they\'re on if sizing matters; otherwise use the conservative default in bold.\n\n- **Vapi**: **10** included per account; reserved lines can be purchased self-serve; Enterprise is unlimited.\n- **Retell**: Pay-as-you-go includes **20**; Enterprise has no cap.\n- **Bland**: Start=**10**, Build=50, Scale=100, Enterprise=unlimited.\n- **ElevenLabs**: Free=**4**, Starter=6, Creator=10, Pro=20, Scale=30, Business=30. Burst pricing can temporarily allow up to 3x base.\n- **LiveKit Cloud**: Build=**5**, Ship=20, Scale=50 managed inference sessions (the usual gate for voice agents); agent-session concurrency can go higher (Scale up to 600).\n\n## WebSocket\n\nFor `adapter: "websocket"`, Vent sends binary 16-bit mono PCM audio over one websocket connection. Websocket text frames are optional JSON events. Audio-only websocket agents still work, but events improve turn detection and observability. Vent sends `{"type":"end-call"}` when the test is done.\n\nUseful websocket text frames:\n\n```jsonc\n{"type":"speech-update","status":"started"}\n{"type":"speech-update","status":"stopped"}\n{"type":"tool_call","name":"check_availability","arguments":{},"result":{},"successful":true,"duration_ms":150}\n{"type":"vent:timing","stt_ms":120,"llm_ms":450,"tts_ms":80}\n{"type":"vent:session","platform":"custom","provider_call_id":"call_123","provider_session_id":"session_abc"}\n{"type":"vent:call-metadata","call_metadata":{"recording_url":"https://...","cost_usd":0.12}}\n{"type":"vent:transcript","role":"caller","text":"I need to reschedule","turn_index":0}\n{"type":"vent:transfer","destination":"+15551234567","status":"attempted"}\n{"type":"vent:debug-url","label":"trace","url":"https://..."}\n{"type":"vent:warning","message":"provider warning","code":"provider_warning"}\n```\n\n`vent:session-report` is **not** handled by the websocket adapter \u2014 it\'s only consumed by the LiveKit helper. Do not emit it from a websocket agent.\n\nPlatform adapters capture tool calls automatically. Websocket agents must emit `tool_call` frames for tool observability. Platform adapters get component latency automatically. Websocket agents should emit `vent:timing` after each agent response when STT/LLM/TTS breakdown is available.\n\n## LiveKit\n\nBefore running LiveKit tests, install and add the Vent helper to the LiveKit agent entrypoint. Node: `npm install @vent-hq/livekit`, then call `instrumentLiveKitAgent({ ctx, session })`. Python: `pip install vent-livekit`, then call `instrument_livekit_agent(ctx=ctx, session=session)`.\n\nLiveKit direct mode requires the LiveKit Agents SDK. Custom LiveKit participants should use the websocket adapter with a relay. If the LiveKit agent registered with an explicit dispatch name, set `livekit_agent_name` in `platform`.\n\nLiveKit does not support multiple concurrent Vent calls against one agent process yet. Run LiveKit calls sequentially unless you intentionally start separate agent worker processes and route each call to its own process. For Node agents, that means separate Node.js processes. Do not treat parallel calls against a single LiveKit worker as a valid concurrency test until multi-call support is engineered.\n\nUse the LiveKit helper for observability; do not publish `vent:*` topics manually. Do not hand-roll `vent:session-report` from `ctx.addShutdownCallback`; after `room.disconnect()` it can fail with `engine is closed`. The helper captures SDK metrics, tool events, conversation items, usage, and close events. Native LiveKit `lk.transcription` and `lk.agent.state` provide transcript and agent-state timing.\n\nBefore LiveKit tests, verify the agent process is running. After restart, wait at least 60 seconds before running a call. If a LiveKit run fails with `agent did not speak` or `no speech detected`, kill the stale agent, wait 60 seconds, then restart once.\n\n## Vent Output\n\n`npx vent-hq run` returns one JSON result on stdout in non-TTY agent mode; it is not an SSE JSONL stream. Analyze that result directly. Exit codes: `0` = call ran end-to-end through the pipeline; `1` = pipeline-level failure (call did not complete cleanly); `2` = harness error. None of these is a judgment of whether the agent actually accomplished the scenario\'s goal \u2014 decide that yourself from the transcript, tool calls, and the scenario\'s expected behavior.\n\nAlways-present keys (value may be `null` for `name`/`error`, otherwise non-null): `name`, `status`, `caller_prompt`, `duration_ms`, `error`, `transcript`, `tool_calls`, `warnings`, `audio_actions`. Always-present keys with nullable value (when the underlying analysis did not run): `latency`, `component_latency`, `call_metadata`, `emotion` (requires `prosody: true`). Conditional key (absent unless requested): `debug` (only when `--verbose`). Branch on null before reading nested fields.\n\nTrigger `--verbose` when:\n- transcript accuracy looks wrong and you need `platform_transcript` to compare against Vent\'s STT\n- latency looks bad and you need per-turn or component-level breakdowns\n- interruption / barge-in behavior looks wrong and you need per-turn debug fields\n- tool-call execution looks inconsistent or missing and you need the raw observed timeline\n- the provider returned warnings or errors and you need provider-native artifacts in `debug.provider_metadata`\n\nSkip `--verbose` for normal pass/fail iteration \u2014 it adds noise to stdout for no benefit.\n\nIgnore minor STT mis-transcriptions in `transcript` text (e.g. `"check teach hat"` for `"check that"`, swapped homophones, missing question marks on short tails). These are streaming-STT artifacts, not agent bugs. Judge on semantic intent, not exact spelling. Only flag transcript quality when it prevents understanding what the agent actually said.\n\n`audio_actions` lists which turns had injected interrupts; check the agent\'s reply at the next turn to judge whether it acknowledged the barge-in or restarted from scratch. Overtalk needs the recording and is not evaluable from transcript text alone.\n\nFor transfers, `call_metadata.transfer_attempted` (provider claimed it tried) and `call_metadata.transfer_completed` (mechanically verified by Vent) can disagree \u2014 report both. Use `call_metadata.transfers[]` for destination, type, and per-attempt status.\n\n## Reporting Results\n\nBefore reporting, read the agent\'s code to locate where the observed behavior originates. If the issue is small and you can fix it, fix it and explain what you did \u2014 don\'t ask permission first.\n\nThen write whatever shape of report fits the call. No mandated structure. The report is for a voice-agent developer who wants to know: did my change work, and if not, what do I do next? Adapt the depth to the call \u2014 a clean pass with nothing to report needs little; a regression with a multi-layer cause needs more. Use a transcript excerpt when it helps the user see what actually happened.\n\nHard rules \u2014 these are about not leaking Vent\'s internals into a user-facing report:\n\n- Translate raw numbers into plain English. Users do not know what "p95 850ms" means; say "snappy throughout" or "noticeably sluggish, around 1.6 seconds with the LLM as the bottleneck."\n- Always include the recording from `call_metadata.recording_url` as an inline `[Recording](url)` link, placed in **one block at the very end of the report** \u2014 never sprinkled through the prose. For a single call, it\'s one link as the last line. For multi-call reports, the bottom block lists one link per call labeled by name (e.g. `reschedule-appointment: [Recording](url)`). Never paste a bare URL.\n- Mission success is your judgment, not Vent\'s. The `status` field and exit code only indicate whether the call ran end-to-end through the pipeline. Don\'t parrot a Vent "pass" when the transcript shows the agent failed the task, and don\'t flag a Vent "fail" as a real bug when it was a pipeline blip.\n- Don\'t surface `warnings[]` (infrastructure noise). Don\'t surface Vent-side artifacts (caller wait modes, harness timing, internal pipeline quirks). Don\'t include `cost_usd` unless the user asks.\n- Surface only what the user can act on in their own agent\'s code or config.\n\nFor multi-call runs, lead with a short summary stating *your* judgment of what happened across the calls (e.g. "3 of 4 did what they were supposed to; `cancel-appointment` never actually canceled"), not a parroted Vent pass/fail count. Then cover each call with whatever depth it needs.\n';
5911
5911
 
5912
5912
  // src/lib/setup.ts
5913
5913
  var SUITE_SCAFFOLD = JSON.stringify(
@@ -5932,20 +5932,6 @@ async function installClaudeCode(cwd) {
5932
5932
  await fs5.mkdir(dir, { recursive: true });
5933
5933
  await fs5.writeFile(path3.join(dir, "SKILL.md"), claude_code_default);
5934
5934
  printSuccess("Claude Code: .claude/skills/vent/SKILL.md", { force: true });
5935
- const settingsPath = path3.join(cwd, ".claude", "settings.json");
5936
- const ventPermission = "Bash(npx vent-hq *)";
5937
- let settings;
5938
- try {
5939
- settings = JSON.parse(await fs5.readFile(settingsPath, "utf-8"));
5940
- } catch {
5941
- settings = {};
5942
- }
5943
- const allow = settings.permissions?.allow ?? [];
5944
- if (!allow.includes(ventPermission)) {
5945
- settings.permissions = { ...settings.permissions, allow: [...allow, ventPermission] };
5946
- await fs5.writeFile(settingsPath, JSON.stringify(settings, null, 2) + "\n");
5947
- printSuccess("Claude Code: .claude/settings.json (auto-approve vent-hq commands)", { force: true });
5948
- }
5949
5935
  }
5950
5936
  async function installCursor(cwd) {
5951
5937
  const dir = path3.join(cwd, ".cursor", "rules");
@@ -5953,7 +5939,10 @@ async function installCursor(cwd) {
5953
5939
  await fs5.writeFile(path3.join(dir, "vent.mdc"), cursor_default);
5954
5940
  printSuccess("Cursor: .cursor/rules/vent.mdc", { force: true });
5955
5941
  }
5956
- var VENT_MARKER = "# Vent \u2014 Voice Agent Calls";
5942
+ var VENT_MARKERS = [
5943
+ "# Vent - Voice Agent Calls",
5944
+ "# Vent \u2014 Voice Agent Calls"
5945
+ ];
5957
5946
  async function installCodex(cwd) {
5958
5947
  const filePath = path3.join(cwd, "AGENTS.md");
5959
5948
  let existing = "";
@@ -5961,9 +5950,9 @@ async function installCodex(cwd) {
5961
5950
  existing = await fs5.readFile(filePath, "utf-8");
5962
5951
  } catch {
5963
5952
  }
5964
- if (existing.includes(VENT_MARKER)) {
5965
- const idx = existing.indexOf(VENT_MARKER);
5966
- await fs5.writeFile(filePath, existing.slice(0, idx).trimEnd() + "\n\n" + codex_default + "\n");
5953
+ const markerIndex = VENT_MARKERS.map((marker) => existing.indexOf(marker)).filter((idx) => idx >= 0).sort((a, b) => a - b)[0];
5954
+ if (markerIndex != null) {
5955
+ await fs5.writeFile(filePath, existing.slice(0, markerIndex).trimEnd() + "\n\n" + codex_default + "\n");
5967
5956
  } else if (existing) {
5968
5957
  await fs5.writeFile(filePath, existing.trimEnd() + "\n\n" + codex_default + "\n");
5969
5958
  } else {
@@ -6146,7 +6135,7 @@ async function main() {
6146
6135
  return 0;
6147
6136
  }
6148
6137
  if (command === "--version" || command === "-v") {
6149
- const pkg = await import("./package-EFHRZEAF.mjs");
6138
+ const pkg = await import("./package-E6AAWLZS.mjs");
6150
6139
  console.log(`vent-hq ${pkg.default.version}`);
6151
6140
  return 0;
6152
6141
  }
@@ -4,7 +4,7 @@ import "./chunk-XYDL7GY6.mjs";
4
4
  // package.json
5
5
  var package_default = {
6
6
  name: "vent-hq",
7
- version: "0.10.1",
7
+ version: "0.10.2",
8
8
  type: "module",
9
9
  description: "Vent CLI \u2014 CI/CD for voice AI agents",
10
10
  bin: {
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "vent-hq",
3
- "version": "0.10.1",
3
+ "version": "0.10.2",
4
4
  "type": "module",
5
5
  "description": "Vent CLI — CI/CD for voice AI agents",
6
6
  "bin": {