@pinecall/skills 0.1.1 → 0.1.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/build.mjs CHANGED
@@ -33,7 +33,9 @@ const HOUSE_RULES = `## House rules — always apply
33
33
  - **TTS model is auto-derived from \`language\`** — non-English agents (e.g.
34
34
  \`language: "es"\`) default ElevenLabs to \`eleven_multilingual_v2\` so numbers,
35
35
  dates and currency are pronounced correctly (flash/turbo don't normalize text).
36
- English stays on \`eleven_flash_v2_5\`. Override with \`voice: { ..., model: "..." }\`.
36
+ English stays on \`eleven_flash_v2_5\`. To keep flash on a non-English agent
37
+ (lower latency/cost), set the top-level \`flash: true\` flag. To pin any model,
38
+ use \`voice: { ..., model: "..." }\` (explicit model always wins over \`flash\`).
37
39
  - **Greeting**: inbound → \`greeting\` field in \`pc.agent()\`; outbound → \`greeting\`
38
40
  field in \`agent.dial()\`. It is sugar for \`call.say()\` in \`call.started\`.
39
41
  - **Auth**: \`new Pinecall()\` reads \`PINECALL_API_KEY\` from env and auto-connects.
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@pinecall/skills",
3
- "version": "0.1.1",
3
+ "version": "0.1.3",
4
4
  "description": "Agent Skills for the Pinecall SDK — installable into Claude Code, Antigravity, Cursor, Copilot and any agent that supports the open Skills format.",
5
5
  "type": "module",
6
6
  "license": "MIT",
@@ -30,7 +30,9 @@ table below indexes every page; open the `references/…` file for the full text
30
30
  - **TTS model is auto-derived from `language`** — non-English agents (e.g.
31
31
  `language: "es"`) default ElevenLabs to `eleven_multilingual_v2` so numbers,
32
32
  dates and currency are pronounced correctly (flash/turbo don't normalize text).
33
- English stays on `eleven_flash_v2_5`. Override with `voice: { ..., model: "..." }`.
33
+ English stays on `eleven_flash_v2_5`. To keep flash on a non-English agent
34
+ (lower latency/cost), set the top-level `flash: true` flag. To pin any model,
35
+ use `voice: { ..., model: "..." }` (explicit model always wins over `flash`).
34
36
  - **Greeting**: inbound → `greeting` field in `pc.agent()`; outbound → `greeting`
35
37
  field in `agent.dial()`. It is sugar for `call.say()` in `call.started`.
36
38
  - **Auth**: `new Pinecall()` reads `PINECALL_API_KEY` from env and auto-connects.
@@ -34,7 +34,9 @@ table below indexes every page; open the `references/…` file for the full text
34
34
  - **TTS model is auto-derived from `language`** — non-English agents (e.g.
35
35
  `language: "es"`) default ElevenLabs to `eleven_multilingual_v2` so numbers,
36
36
  dates and currency are pronounced correctly (flash/turbo don't normalize text).
37
- English stays on `eleven_flash_v2_5`. Override with `voice: { ..., model: "..." }`.
37
+ English stays on `eleven_flash_v2_5`. To keep flash on a non-English agent
38
+ (lower latency/cost), set the top-level `flash: true` flag. To pin any model,
39
+ use `voice: { ..., model: "..." }` (explicit model always wins over `flash`).
38
40
  - **Greeting**: inbound → `greeting` field in `pc.agent()`; outbound → `greeting`
39
41
  field in `agent.dial()`. It is sugar for `call.say()` in `call.started`.
40
42
  - **Auth**: `new Pinecall()` reads `PINECALL_API_KEY` from env and auto-connects.
@@ -52,7 +52,9 @@ const agent = pc.agent("mara", {
52
52
  - **TTS model is auto-derived from `language`** — non-English agents (e.g.
53
53
  `language: "es"`) default ElevenLabs to `eleven_multilingual_v2` so numbers,
54
54
  dates and currency are pronounced correctly (flash/turbo don't normalize text).
55
- English stays on `eleven_flash_v2_5`. Override with `voice: { ..., model: "..." }`.
55
+ English stays on `eleven_flash_v2_5`. To keep flash on a non-English agent
56
+ (lower latency/cost), set the top-level `flash: true` flag. To pin any model,
57
+ use `voice: { ..., model: "..." }` (explicit model always wins over `flash`).
56
58
  - **Greeting**: inbound → `greeting` field in `pc.agent()`; outbound → `greeting`
57
59
  field in `agent.dial()`. It is sugar for `call.say()` in `call.started`.
58
60
  - **Auth**: `new Pinecall()` reads `PINECALL_API_KEY` from env and auto-connects.
@@ -26,6 +26,7 @@ table below indexes every page; open the `references/…` file for the full text
26
26
  | **Tools and Functions** | Let your agent take actions: look up data, transfer calls, book appointments. | [`references/guides/tools-and-functions.md`](references/guides/tools-and-functions.md) · [docs](https://docs.pinecall.io/guides/tools-and-functions) |
27
27
  | **Knowledge bases (RAG)** | Tutorial — ground a voice or chat agent on your own documents with retrieval-augmented generation. | [`references/guides/knowledge-bases.md`](references/guides/knowledge-bases.md) · [docs](https://docs.pinecall.io/guides/knowledge-bases) |
28
28
  | **Multi-Tenant Dashboards** | Host many tenants on one Pinecall instance with scoped event streams. | [`references/guides/multi-tenant.md`](references/guides/multi-tenant.md) · [docs](https://docs.pinecall.io/guides/multi-tenant) |
29
+ | **Self-Hosted LLM Gateway** | Consume Pinecall's hosted open model (Qwen3) for chat and structured analysis over an authenticated, plan-gated streaming endpoint. | [`references/guides/self-hosted-llm.md`](references/guides/self-hosted-llm.md) · [docs](https://docs.pinecall.io/guides/self-hosted-llm) |
29
30
  | **SSE Event Streaming** | Stream agent events to your frontend in real time with Server-Sent Events. | [`references/guides/sse-streaming.md`](references/guides/sse-streaming.md) · [docs](https://docs.pinecall.io/guides/sse-streaming) |
30
31
  | **WebSocket Event Streaming** | Stream agent events over WebSocket for bidirectional, real-time communication with your frontend. | [`references/guides/ws-streaming.md`](references/guides/ws-streaming.md) · [docs](https://docs.pinecall.io/guides/ws-streaming) |
31
32
  | **Dev Mode** | Run dev and production agents on the same phone number, with zero extra Twilio cost. | [`references/guides/dev-mode.md`](references/guides/dev-mode.md) · [docs](https://docs.pinecall.io/guides/dev-mode) |
@@ -61,7 +62,9 @@ const agent = pc.agent("mara", {
61
62
  - **TTS model is auto-derived from `language`** — non-English agents (e.g.
62
63
  `language: "es"`) default ElevenLabs to `eleven_multilingual_v2` so numbers,
63
64
  dates and currency are pronounced correctly (flash/turbo don't normalize text).
64
- English stays on `eleven_flash_v2_5`. Override with `voice: { ..., model: "..." }`.
65
+ English stays on `eleven_flash_v2_5`. To keep flash on a non-English agent
66
+ (lower latency/cost), set the top-level `flash: true` flag. To pin any model,
67
+ use `voice: { ..., model: "..." }` (explicit model always wins over `flash`).
65
68
  - **Greeting**: inbound → `greeting` field in `pc.agent()`; outbound → `greeting`
66
69
  field in `agent.dial()`. It is sugar for `call.say()` in `call.started`.
67
70
  - **Auth**: `new Pinecall()` reads `PINECALL_API_KEY` from env and auto-connects.
@@ -0,0 +1,148 @@
1
+ ---
2
+ title: "Self-Hosted LLM Gateway"
3
+ description: "Consume Pinecall's hosted open model (Qwen3) for chat and structured analysis over an authenticated, plan-gated streaming endpoint."
4
+ ---
5
+
6
+ # Self-Hosted LLM Gateway
7
+
8
+ Pinecall hosts an open LLM and exposes it through an authenticated streaming
9
+ endpoint on the sdk-server. Use it for any task that wants a cheap, in-house LLM
10
+ instead of a paid per-token provider: **chat / agent loops** and **structured
11
+ analysis** (classification, extraction, summarization, recommendations).
12
+
13
+ | Model | Size | Best for |
14
+ |-------|------|----------|
15
+ | `qwen3:14b` | ~9 GB | **default** — hybrid model: clean JSON/analysis with thinking off, step-by-step reasoning with thinking on |
16
+ | `deepseek-r1:14b` | ~9 GB | dedicated reasoning — **coming soon** |
17
+ | `qwen2.5-coder:14b` | ~9 GB | code generation, refactors, tool/JSON authoring — **coming soon** |
18
+ | `mistral-nemo:12b` | ~7 GB | strong multilingual + 128k context — **coming soon** |
19
+
20
+ > Models flagged **coming soon** aren't live yet — `GET /api/llm/models` always
21
+ > returns the currently available set.
22
+
23
+ ## Authentication & access
24
+
25
+ - **Base URL:** `https://voice.pinecall.io`
26
+ - **Auth:** a Pinecall API key via `X-API-Key: <key>` **or** `Authorization: Bearer <key>`.
27
+ - **Plan gating:** **paid plans only** (`starter`, `pro`, `enterprise`). Both `free`
28
+ and `free_trial` receive **`402 SUBSCRIPTION_REQUIRED`**.
29
+
30
+ ## `POST /api/llm/chat`
31
+
32
+ Streams the completion as **Server-Sent Events**.
33
+
34
+ ### Request body
35
+
36
+ ```jsonc
37
+ {
38
+ "messages": [{ "role": "user", "content": "..." }], // required
39
+ "system": "optional system prompt",
40
+ "model": "qwen3:14b", // default: qwen3:14b
41
+ "mode": "chat" | "analysis", // default: "chat"
42
+ "think": false, // reasoning on/off (default false; analysis forces false)
43
+ "temperature": 0.7,
44
+ "max_tokens": 512,
45
+ "format": { /* JSON schema */ } | "json" // analysis mode only
46
+ }
47
+ ```
48
+
49
+ Qwen3 is a **hybrid** model: `think: false` (the default) returns a clean, direct
50
+ answer — best for JSON and low latency. `think: true` lets it reason step-by-step
51
+ first (better on hard problems); the reasoning never leaks into the streamed
52
+ answer. `mode: "analysis"` always forces thinking off so JSON stays clean.
53
+
54
+ ### SSE event stream
55
+
56
+ ```
57
+ data: {"type":"token","content":"..."} // repeated — incremental text
58
+ data: {"type":"done","usage":{"input_tokens":N,"output_tokens":M}}
59
+ data: {"type":"error","error":"...","code":"UPSTREAM_ERROR|INTERNAL"}
60
+ data: [DONE] // terminator
61
+ ```
62
+
63
+ ### Errors
64
+
65
+ | Status | Code | Meaning |
66
+ |--------|------|---------|
67
+ | 401 | `MISSING_KEY` / `INVALID_KEY` | no or bad API key |
68
+ | 402 | `SUBSCRIPTION_REQUIRED` | tier is `free` or `free_trial` |
69
+ | 400 | `MISSING_MESSAGES` / `BAD_MODEL` / `BAD_REQUEST` | invalid request |
70
+
71
+ ## `GET /api/llm/models`
72
+
73
+ Same auth + gate. Returns the available models, the default, and the caller's tier —
74
+ handy to probe access before streaming. **This is the source of truth for what's
75
+ currently available** (the list grows over time).
76
+
77
+ ```json
78
+ { "models": ["qwen3:14b"], "default": "qwen3:14b", "tier": "pro" }
79
+ ```
80
+
81
+ ## Chat — streaming agent loop
82
+
83
+ ```ts
84
+ const res = await fetch("https://voice.pinecall.io/api/llm/chat", {
85
+ method: "POST",
86
+ headers: {
87
+ "Content-Type": "application/json",
88
+ "X-API-Key": process.env.PINECALL_API_KEY!,
89
+ },
90
+ body: JSON.stringify({
91
+ model: "qwen3:14b",
92
+ system: "You are a concise assistant.",
93
+ messages: [{ role: "user", content: "Summarize today's bookings." }],
94
+ // think: true, // ← opt into step-by-step reasoning for harder questions
95
+ }),
96
+ });
97
+
98
+ const reader = res.body!.getReader();
99
+ const dec = new TextDecoder();
100
+ let buf = "";
101
+ for (;;) {
102
+ const { value, done } = await reader.read();
103
+ if (done) break;
104
+ buf += dec.decode(value, { stream: true });
105
+ for (const line of buf.split("\n\n")) {
106
+ if (!line.startsWith("data: ")) continue;
107
+ const data = line.slice(6);
108
+ if (data === "[DONE]") break;
109
+ const evt = JSON.parse(data);
110
+ if (evt.type === "token") process.stdout.write(evt.content);
111
+ }
112
+ buf = buf.slice(buf.lastIndexOf("\n\n") + 2);
113
+ }
114
+ ```
115
+
116
+ ## Analysis — structured JSON (schema-enforced)
117
+
118
+ Set `mode: "analysis"` and pass a JSON **schema** in `format`. The gateway routes
119
+ analysis requests through a native path that constrains the output to your schema
120
+ (and forces thinking off) — ideal for recommendations and extraction.
121
+
122
+ ```ts
123
+ const body = {
124
+ model: "qwen3:14b",
125
+ mode: "analysis",
126
+ system: "You are a pricing engine. Return JSON only.",
127
+ messages: [{ role: "user", content: "Service: deep-tissue massage, $80, 95% utilization, 60% margin. Recommend an optimal price." }],
128
+ format: {
129
+ type: "object",
130
+ properties: {
131
+ suggestedPrice: { type: "number" },
132
+ confidence: { type: "string", enum: ["low", "medium", "high"] },
133
+ rationale: { type: "string" },
134
+ },
135
+ required: ["suggestedPrice", "confidence", "rationale"],
136
+ },
137
+ };
138
+ // POST as above, accumulate the `token` chunks into `text`, then:
139
+ const rec = JSON.parse(text); // { suggestedPrice, confidence, rationale }
140
+ ```
141
+
142
+ > **Warning:** Pass a real JSON-schema **object**. The string `"json"` (OpenAI-style
143
+ > `response_format`) only nudges the model toward JSON — it does **not** enforce a shape.
144
+
145
+ > **Note:** This open model is for **in-app responders, analysis, and
146
+ > recommendations**. For **live voice / WhatsApp agents**, the Pinecall server-side
147
+ > LLM supports OpenAI / Mistral / Google / Anthropic — see
148
+ > [LLM Providers](/reference/llm-providers).
@@ -47,7 +47,9 @@ const agent = pc.agent("mara", {
47
47
  - **TTS model is auto-derived from `language`** — non-English agents (e.g.
48
48
  `language: "es"`) default ElevenLabs to `eleven_multilingual_v2` so numbers,
49
49
  dates and currency are pronounced correctly (flash/turbo don't normalize text).
50
- English stays on `eleven_flash_v2_5`. Override with `voice: { ..., model: "..." }`.
50
+ English stays on `eleven_flash_v2_5`. To keep flash on a non-English agent
51
+ (lower latency/cost), set the top-level `flash: true` flag. To pin any model,
52
+ use `voice: { ..., model: "..." }` (explicit model always wins over `flash`).
51
53
  - **Greeting**: inbound → `greeting` field in `pc.agent()`; outbound → `greeting`
52
54
  field in `agent.dial()`. It is sugar for `call.say()` in `call.started`.
53
55
  - **Auth**: `new Pinecall()` reads `PINECALL_API_KEY` from env and auto-connects.
@@ -36,7 +36,9 @@ table below indexes every page; open the `references/…` file for the full text
36
36
  - **TTS model is auto-derived from `language`** — non-English agents (e.g.
37
37
  `language: "es"`) default ElevenLabs to `eleven_multilingual_v2` so numbers,
38
38
  dates and currency are pronounced correctly (flash/turbo don't normalize text).
39
- English stays on `eleven_flash_v2_5`. Override with `voice: { ..., model: "..." }`.
39
+ English stays on `eleven_flash_v2_5`. To keep flash on a non-English agent
40
+ (lower latency/cost), set the top-level `flash: true` flag. To pin any model,
41
+ use `voice: { ..., model: "..." }` (explicit model always wins over `flash`).
40
42
  - **Greeting**: inbound → `greeting` field in `pc.agent()`; outbound → `greeting`
41
43
  field in `agent.dial()`. It is sugar for `call.say()` in `call.started`.
42
44
  - **Auth**: `new Pinecall()` reads `PINECALL_API_KEY` from env and auto-connects.
@@ -77,9 +77,46 @@ The server picks the ElevenLabs model from your `language`:
77
77
  | `en` (or unset) | `eleven_flash_v2_5` | Fastest, optimized for real-time streaming |
78
78
  | Any non-English (`es`, `fr`, `de`, …) | `eleven_multilingual_v2` | Flash/Turbo don't normalize text, so Spanish & other languages mispronounce numbers, dates, currency and abbreviations. The multilingual model reads them naturally. |
79
79
 
80
- > `eleven_multilingual_v2` is billed at a higher rate than flash (it's a higher-quality model). If you'd rather keep the faster/cheaper flash model for a non-English agent, pin it explicitly (see below).
80
+ > `eleven_multilingual_v2` is billed at a higher rate than flash (it's a higher-quality model). If you'd rather keep the faster/cheaper flash model for a non-English agent, use the `flash` shortcut or pin the model explicitly (both below).
81
81
 
82
- **Override the model** with the optional `model` field it always wins over the auto-default:
82
+ #### `flash: true` — keep flash on a non-English agent
83
+
84
+ The multilingual model trades a little **latency** for much better pronunciation.
85
+ If your non-English agent should prioritize **lowest latency / lowest cost** over
86
+ pronunciation quality, set the top-level `flash` flag — it opts out of the
87
+ multilingual auto-default and keeps `eleven_flash_v2_5`:
88
+
89
+ ```typescript
90
+ const agent = pc.agent("sofia", {
91
+ prompt: "Sos Sofía, asistente de la clínica.",
92
+ llm: "openai/gpt-5-chat-latest",
93
+ voice: "elevenlabs/agus",
94
+ stt: "deepgram/flux",
95
+ language: "es",
96
+ flash: true, // ← stay on eleven_flash_v2_5 despite language: "es"
97
+ });
98
+ ```
99
+
100
+ `flash` is a sibling of `language` (not inside `voice`), so it reads cleanly with
101
+ the rest of the shortcuts. Semantics:
102
+
103
+ | Config | Resulting ElevenLabs model |
104
+ |---|---|
105
+ | `language: "es"` | `eleven_multilingual_v2` (auto) |
106
+ | `language: "es"`, `flash: true` | `eleven_flash_v2_5` |
107
+ | `language: "en"` (with or without `flash`) | `eleven_flash_v2_5` |
108
+ | `voice: { model: "..." }` (any `flash`/`language`) | the pinned model — explicit always wins |
109
+
110
+ Notes:
111
+
112
+ - **ElevenLabs only.** `flash` has no effect on Cartesia or Polly.
113
+ - **No-op for English** — English already defaults to flash.
114
+ - **An explicit `voice: { model }` always wins** over `flash`. Use `flash: true`
115
+ for the common "I want the cheap fast model" case; use the `model` field when
116
+ you need a specific model id.
117
+ - Works per-channel too: `phoneNumbers: [{ number, language: "es", flash: true }]`.
118
+
119
+ **Override the model** with the optional `model` field — it always wins over both the auto-default and `flash`:
83
120
 
84
121
  ```typescript
85
122
  voice: {
@@ -49,7 +49,9 @@ const agent = pc.agent("mara", {
49
49
  - **TTS model is auto-derived from `language`** — non-English agents (e.g.
50
50
  `language: "es"`) default ElevenLabs to `eleven_multilingual_v2` so numbers,
51
51
  dates and currency are pronounced correctly (flash/turbo don't normalize text).
52
- English stays on `eleven_flash_v2_5`. Override with `voice: { ..., model: "..." }`.
52
+ English stays on `eleven_flash_v2_5`. To keep flash on a non-English agent
53
+ (lower latency/cost), set the top-level `flash: true` flag. To pin any model,
54
+ use `voice: { ..., model: "..." }` (explicit model always wins over `flash`).
53
55
  - **Greeting**: inbound → `greeting` field in `pc.agent()`; outbound → `greeting`
54
56
  field in `agent.dial()`. It is sugar for `call.say()` in `call.started`.
55
57
  - **Auth**: `new Pinecall()` reads `PINECALL_API_KEY` from env and auto-connects.
@@ -35,7 +35,8 @@ const agent = pc.agent("my-agent", {
35
35
  | Config field | Type | Description |
36
36
  |---|---|---|
37
37
  | `voice` | `string \| VoiceConfig` | TTS provider — shortcut or full config |
38
- | `language` | `string` | BCP-47 language code |
38
+ | `language` | `string` | BCP-47 language code. Non-English auto-selects ElevenLabs `eleven_multilingual_v2` |
39
+ | `flash` | `boolean` | Keep ElevenLabs `eleven_flash_v2_5` on a non-English agent (lowest latency/cost) instead of the multilingual auto-default. ElevenLabs-only; see [TTS Providers](/reference/tts-providers) |
39
40
  | `stt` | `string \| STTConfig` | STT provider — shortcut or full config |
40
41
  | `llm` | `LLMConfig` | LLM provider, model, prompt, enabled flag |
41
42
  | `tools` | `Tool[]` | Declarative tools created with `tool()` + Zod schemas (auto-executed) |
@@ -30,7 +30,9 @@ table below indexes every page; open the `references/…` file for the full text
30
30
  - **TTS model is auto-derived from `language`** — non-English agents (e.g.
31
31
  `language: "es"`) default ElevenLabs to `eleven_multilingual_v2` so numbers,
32
32
  dates and currency are pronounced correctly (flash/turbo don't normalize text).
33
- English stays on `eleven_flash_v2_5`. Override with `voice: { ..., model: "..." }`.
33
+ English stays on `eleven_flash_v2_5`. To keep flash on a non-English agent
34
+ (lower latency/cost), set the top-level `flash: true` flag. To pin any model,
35
+ use `voice: { ..., model: "..." }` (explicit model always wins over `flash`).
34
36
  - **Greeting**: inbound → `greeting` field in `pc.agent()`; outbound → `greeting`
35
37
  field in `agent.dial()`. It is sugar for `call.say()` in `call.started`.
36
38
  - **Auth**: `new Pinecall()` reads `PINECALL_API_KEY` from env and auto-connects.
@@ -31,7 +31,9 @@ table below indexes every page; open the `references/…` file for the full text
31
31
  - **TTS model is auto-derived from `language`** — non-English agents (e.g.
32
32
  `language: "es"`) default ElevenLabs to `eleven_multilingual_v2` so numbers,
33
33
  dates and currency are pronounced correctly (flash/turbo don't normalize text).
34
- English stays on `eleven_flash_v2_5`. Override with `voice: { ..., model: "..." }`.
34
+ English stays on `eleven_flash_v2_5`. To keep flash on a non-English agent
35
+ (lower latency/cost), set the top-level `flash: true` flag. To pin any model,
36
+ use `voice: { ..., model: "..." }` (explicit model always wins over `flash`).
35
37
  - **Greeting**: inbound → `greeting` field in `pc.agent()`; outbound → `greeting`
36
38
  field in `agent.dial()`. It is sugar for `call.say()` in `call.started`.
37
39
  - **Auth**: `new Pinecall()` reads `PINECALL_API_KEY` from env and auto-connects.
@@ -30,7 +30,9 @@ table below indexes every page; open the `references/…` file for the full text
30
30
  - **TTS model is auto-derived from `language`** — non-English agents (e.g.
31
31
  `language: "es"`) default ElevenLabs to `eleven_multilingual_v2` so numbers,
32
32
  dates and currency are pronounced correctly (flash/turbo don't normalize text).
33
- English stays on `eleven_flash_v2_5`. Override with `voice: { ..., model: "..." }`.
33
+ English stays on `eleven_flash_v2_5`. To keep flash on a non-English agent
34
+ (lower latency/cost), set the top-level `flash: true` flag. To pin any model,
35
+ use `voice: { ..., model: "..." }` (explicit model always wins over `flash`).
34
36
  - **Greeting**: inbound → `greeting` field in `pc.agent()`; outbound → `greeting`
35
37
  field in `agent.dial()`. It is sugar for `call.say()` in `call.started`.
36
38
  - **Auth**: `new Pinecall()` reads `PINECALL_API_KEY` from env and auto-connects.
@@ -33,7 +33,9 @@ table below indexes every page; open the `references/…` file for the full text
33
33
  - **TTS model is auto-derived from `language`** — non-English agents (e.g.
34
34
  `language: "es"`) default ElevenLabs to `eleven_multilingual_v2` so numbers,
35
35
  dates and currency are pronounced correctly (flash/turbo don't normalize text).
36
- English stays on `eleven_flash_v2_5`. Override with `voice: { ..., model: "..." }`.
36
+ English stays on `eleven_flash_v2_5`. To keep flash on a non-English agent
37
+ (lower latency/cost), set the top-level `flash: true` flag. To pin any model,
38
+ use `voice: { ..., model: "..." }` (explicit model always wins over `flash`).
37
39
  - **Greeting**: inbound → `greeting` field in `pc.agent()`; outbound → `greeting`
38
40
  field in `agent.dial()`. It is sugar for `call.say()` in `call.started`.
39
41
  - **Auth**: `new Pinecall()` reads `PINECALL_API_KEY` from env and auto-connects.
@@ -34,7 +34,9 @@ table below indexes every page; open the `references/…` file for the full text
34
34
  - **TTS model is auto-derived from `language`** — non-English agents (e.g.
35
35
  `language: "es"`) default ElevenLabs to `eleven_multilingual_v2` so numbers,
36
36
  dates and currency are pronounced correctly (flash/turbo don't normalize text).
37
- English stays on `eleven_flash_v2_5`. Override with `voice: { ..., model: "..." }`.
37
+ English stays on `eleven_flash_v2_5`. To keep flash on a non-English agent
38
+ (lower latency/cost), set the top-level `flash: true` flag. To pin any model,
39
+ use `voice: { ..., model: "..." }` (explicit model always wins over `flash`).
38
40
  - **Greeting**: inbound → `greeting` field in `pc.agent()`; outbound → `greeting`
39
41
  field in `agent.dial()`. It is sugar for `call.say()` in `call.started`.
40
42
  - **Auth**: `new Pinecall()` reads `PINECALL_API_KEY` from env and auto-connects.