@pinecall/skills 0.1.4 → 0.1.6

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@pinecall/skills",
3
- "version": "0.1.4",
3
+ "version": "0.1.6",
4
4
  "description": "Agent Skills for the Pinecall SDK — installable into Claude Code, Antigravity, Cursor, Copilot and any agent that supports the open Skills format.",
5
5
  "type": "module",
6
6
  "license": "MIT",
@@ -26,7 +26,6 @@ table below indexes every page; open the `references/…` file for the full text
26
26
  | **Tools and Functions** | Let your agent take actions: look up data, transfer calls, book appointments. | [`references/guides/tools-and-functions.md`](references/guides/tools-and-functions.md) · [docs](https://docs.pinecall.io/guides/tools-and-functions) |
27
27
  | **Knowledge bases (RAG)** | Tutorial — ground a voice or chat agent on your own documents with retrieval-augmented generation. | [`references/guides/knowledge-bases.md`](references/guides/knowledge-bases.md) · [docs](https://docs.pinecall.io/guides/knowledge-bases) |
28
28
  | **Multi-Tenant Dashboards** | Host many tenants on one Pinecall instance with scoped event streams. | [`references/guides/multi-tenant.md`](references/guides/multi-tenant.md) · [docs](https://docs.pinecall.io/guides/multi-tenant) |
29
- | **Self-Hosted LLM Gateway** | Consume Pinecall's hosted open model (Qwen3) for chat and structured analysis over an authenticated, plan-gated streaming endpoint. | [`references/guides/self-hosted-llm.md`](references/guides/self-hosted-llm.md) · [docs](https://docs.pinecall.io/guides/self-hosted-llm) |
30
29
  | **SSE Event Streaming** | Stream agent events to your frontend in real time with Server-Sent Events. | [`references/guides/sse-streaming.md`](references/guides/sse-streaming.md) · [docs](https://docs.pinecall.io/guides/sse-streaming) |
31
30
  | **WebSocket Event Streaming** | Stream agent events over WebSocket for bidirectional, real-time communication with your frontend. | [`references/guides/ws-streaming.md`](references/guides/ws-streaming.md) · [docs](https://docs.pinecall.io/guides/ws-streaming) |
32
31
  | **Dev Mode** | Run dev and production agents on the same phone number, with zero extra Twilio cost. | [`references/guides/dev-mode.md`](references/guides/dev-mode.md) · [docs](https://docs.pinecall.io/guides/dev-mode) |
@@ -20,6 +20,7 @@ table below indexes every page; open the `references/…` file for the full text
20
20
  | **STT Providers** | Speech-to-text providers, models, and tuning parameters. | [`references/reference/stt-providers.md`](references/reference/stt-providers.md) · [docs](https://docs.pinecall.io/reference/stt-providers) |
21
21
  | **TTS Providers** | Text-to-speech providers, voices, and tuning parameters. | [`references/reference/tts-providers.md`](references/reference/tts-providers.md) · [docs](https://docs.pinecall.io/reference/tts-providers) |
22
22
  | **LLM Providers** | Server-side LLM providers and configuration. | [`references/reference/llm-providers.md`](references/reference/llm-providers.md) · [docs](https://docs.pinecall.io/reference/llm-providers) |
23
+ | **Managed vs Bring-Your-Own-Key** | Which STT/TTS/LLM models Pinecall serves with its own keys, and which require yours. | [`references/reference/managed-vs-byok.md`](references/reference/managed-vs-byok.md) · [docs](https://docs.pinecall.io/reference/managed-vs-byok) |
23
24
  | **Session Limits** | Safety limits to prevent runaway sessions. | [`references/reference/session-limits.md`](references/reference/session-limits.md) · [docs](https://docs.pinecall.io/reference/session-limits) |
24
25
  | **REST API** | Static helpers for the Pinecall management API. No WebSocket needed. | [`references/reference/rest-api.md`](references/reference/rest-api.md) · [docs](https://docs.pinecall.io/reference/rest-api) |
25
26
 
@@ -37,6 +37,28 @@ llm: "gpt-5-chat-latest"
37
37
 
38
38
  > The legacy `provider:model` format (e.g. `"openai:gpt-5-chat-latest"`) still works but is not recommended.
39
39
 
40
+ ## Managed vs bring-your-own-key (BYOK)
41
+
42
+ Data-driven from the rate table — see [Managed vs BYOK](/reference/managed-vs-byok)
43
+ for the full list and the live `GET /api/rates/models` query.
44
+
45
+ | LLM provider | Managed (no key needed) | Notes |
46
+ |---|---|---|
47
+ | `openai` | ✅ Yes | Default, recommended |
48
+ | `anthropic` (`claude`) | ✅ Yes | |
49
+ | `google` (`gemini`) | ✅ Yes | |
50
+ | `mistral` | ✅ Yes | |
51
+ | `xai` (`grok`) | ❌ BYOK only | Add an xAI key |
52
+ | `groq` | ❌ BYOK only | Add a Groq key |
53
+ | `cerebras` | ❌ BYOK only | Add a Cerebras key |
54
+ | `deepseek` | ❌ BYOK only | Add a DeepSeek key |
55
+ | `openrouter` | ❌ BYOK only | One key → many models; model = full slug, e.g. `x-ai/grok-4` |
56
+
57
+ > **BYOK enforcement:** configuring a BYOK-only LLM provider without a saved key for
58
+ > it rejects agent registration with `PROVIDER_KEY_REQUIRED` — Pinecall never falls
59
+ > back to its own key. With your own key, those tokens are billed by the provider
60
+ > directly and are **not** deducted from your Pinecall credits.
61
+
40
62
  ## Tuning with a full config object
41
63
 
42
64
  For `temperature`, `max_tokens`, and other tuning parameters, use the full config object:
@@ -104,7 +126,7 @@ llm: {
104
126
  ## Google (Gemini)
105
127
 
106
128
  ```typescript
107
- llm: "google/gemini-2.0-flash"
129
+ llm: "google/gemini-2.5-flash"
108
130
  ```
109
131
 
110
132
  Or with tuning:
@@ -112,7 +134,7 @@ Or with tuning:
112
134
  ```typescript
113
135
  llm: {
114
136
  provider: "google",
115
- model: "gemini-2.0-flash",
137
+ model: "gemini-2.5-flash",
116
138
  enabled: true,
117
139
  temperature: 0.7,
118
140
  max_tokens: 512,
@@ -125,8 +147,7 @@ llm: {
125
147
 
126
148
  | Model | Best for |
127
149
  |---|---|
128
- | `gemini-2.0-flash` | Most voice agents — fast and low cost (recommended default) |
129
- | `gemini-2.5-flash` | Stronger reasoning at a modest cost bump |
150
+ | `gemini-2.5-flash` | Most voice agents — fast, low cost, strong reasoning (recommended default) |
130
151
 
131
152
  ## Anthropic
132
153
 
@@ -157,6 +178,49 @@ llm: {
157
178
 
158
179
  > Opus is intentionally **not** offered for voice agents — it's the premium tier (too slow/costly for real-time). Sonnet 4.6 and Haiku 4.5 are the supported Anthropic models. Set your `ANTHROPIC_API_KEY` on the server (managed) or add an Anthropic credential to your org (BYOK).
159
180
 
181
+ ## xAI Grok (BYOK)
182
+
183
+ ```typescript
184
+ llm: "xai/grok-4" // "grok" is accepted as an alias for "xai"
185
+ ```
186
+
187
+ OpenAI-compatible. Requires your own xAI key. Models: `grok-4`, `grok-4-fast`, `grok-3`.
188
+
189
+ ## Groq (BYOK)
190
+
191
+ ```typescript
192
+ llm: "groq/llama-3.3-70b-versatile"
193
+ ```
194
+
195
+ Fastest open-model inference. Requires your own Groq key.
196
+
197
+ ## Cerebras (BYOK)
198
+
199
+ ```typescript
200
+ llm: "cerebras/llama-3.3-70b"
201
+ ```
202
+
203
+ Highest tokens/sec. Requires your own Cerebras key.
204
+
205
+ ## DeepSeek (BYOK)
206
+
207
+ ```typescript
208
+ llm: "deepseek/deepseek-chat" // or "deepseek/deepseek-reasoner" (no tools)
209
+ ```
210
+
211
+ Requires your own DeepSeek key.
212
+
213
+ ## OpenRouter (BYOK)
214
+
215
+ One key unlocks hundreds of models (OpenAI, Anthropic, Google, xAI/Grok, Llama, …).
216
+ The `model` is the **full OpenRouter slug** — it keeps its own slash:
217
+
218
+ ```typescript
219
+ llm: { provider: "openrouter", model: "x-ai/grok-4" }
220
+ ```
221
+
222
+ Requires your own OpenRouter key.
223
+
160
224
  ## The `enabled` field
161
225
 
162
226
  `enabled: false` disables server-side LLM for this agent. The server still does STT and TTS, but it won't generate responses — you handle every `turn.end` yourself with a client-side LLM.
@@ -0,0 +1,92 @@
1
+ ---
2
+ title: "Managed vs Bring-Your-Own-Key"
3
+ description: "Which STT/TTS/LLM models Pinecall serves with its own keys, and which require yours."
4
+ ---
5
+
6
+ # Managed vs Bring-Your-Own-Key (BYOK)
7
+
8
+ Every STT, TTS and LLM model on Pinecall is one of two kinds:
9
+
10
+ - **Managed** — Pinecall serves it with **its own provider key**. You don't add
11
+ anything; usage is deducted from your Pinecall **credits**.
12
+ - **BYOK (bring your own key)** — Pinecall does **not** host a key for it. You must
13
+ save your **own** API key under **Provider Keys**. That usage is billed by the
14
+ provider **directly** and is **not** deducted from your Pinecall credits.
15
+
16
+ > This split is **data-driven** — it comes from the Pinecall **rate table** in the
17
+ > database (each rate has a `managed` flag), not from a hardcoded list. The tables
18
+ > below are the current state; query the API (below) for the authoritative, live list.
19
+
20
+ ## What Pinecall provides managed (no key needed)
21
+
22
+ | Service | Managed providers |
23
+ |---|---|
24
+ | **STT** | `deepgram` (flux, nova-3), `gladia`, `transcribe` (AWS) |
25
+ | **TTS** | `elevenlabs`, `cartesia` (sonic), `polly` (AWS) |
26
+ | **LLM** | `openai`, `anthropic`, `google` (gemini), `mistral` |
27
+
28
+ ## What requires your own key (BYOK)
29
+
30
+ | Service | BYOK-only providers |
31
+ |---|---|
32
+ | **STT** | `cartesia` (ink-whisper), `elevenlabs` (scribe), `assemblyai` |
33
+ | **TTS** | `rime` |
34
+ | **LLM** | `xai` (grok), `groq`, `cerebras`, `deepseek`, `openrouter` |
35
+
36
+ > Note a provider can be **managed for one service and BYOK for another** — e.g.
37
+ > Cartesia **TTS** (sonic) is managed, but Cartesia **STT** (ink-whisper) is BYOK.
38
+ > ElevenLabs **TTS** is managed, ElevenLabs **STT** (scribe) is BYOK.
39
+
40
+ ## Check it from the API (authoritative, live)
41
+
42
+ The rate table is the source of truth. Query it any time:
43
+
44
+ ```bash
45
+ curl https://playground.pinecall.io/api/rates/models
46
+ ```
47
+
48
+ ```jsonc
49
+ {
50
+ "models": [
51
+ { "service": "stt", "provider": "deepgram", "model": "nova-3", "managed": true },
52
+ { "service": "stt", "provider": "assemblyai", "model": "universal", "managed": false },
53
+ { "service": "llm", "provider": "xai", "model": "grok-4", "managed": false },
54
+ { "service": "tts", "provider": "rime", "model": "mistv2", "managed": false }
55
+ // ...
56
+ ],
57
+ "managedProviders": {
58
+ "stt": ["deepgram", "gladia", "transcribe"],
59
+ "tts": ["cartesia", "elevenlabs", "polly"],
60
+ "llm": ["anthropic", "google", "mistral", "openai"]
61
+ }
62
+ }
63
+ ```
64
+
65
+ `managed: true` → usable with no key. `managed: false` → add your own key.
66
+
67
+ ## BYOK enforcement
68
+
69
+ If you configure a BYOK-only provider and your org has **not** saved a key for it,
70
+ **agent registration is rejected** with code `PROVIDER_KEY_REQUIRED`:
71
+
72
+ ```
73
+ LLM provider 'xai' requires your own API key. Pinecall does not provide a managed
74
+ key for 'xai' — add your key under Provider Keys in the dashboard, then reconnect.
75
+ ```
76
+
77
+ Pinecall never silently falls back to its own key for a BYOK provider.
78
+
79
+ ## Add your own key
80
+
81
+ - **Dashboard** → **Provider Keys** → pick the provider, paste the key.
82
+ - **API**: `PUT /api/credentials` with `{ "provider": "xai", "apiKey": "..." }`.
83
+
84
+ One key can cover multiple services where a provider shares it — e.g. an
85
+ **ElevenLabs** key enables both ElevenLabs TTS and ElevenLabs Scribe STT; a
86
+ **Cartesia** key enables Sonic TTS and Ink-Whisper STT.
87
+
88
+ ## What's next
89
+
90
+ - [STT Providers](/reference/stt-providers)
91
+ - [TTS Providers](/reference/tts-providers)
92
+ - [LLM Providers](/reference/llm-providers)
@@ -24,8 +24,35 @@ Pinecall supports multiple STT providers. Use the `provider/model` format or a f
24
24
 
25
25
  // AWS Transcribe
26
26
  { stt: "transcribe" }
27
+
28
+ // ── Bring-your-own-key only (add your key under Provider Keys first) ──
29
+ { stt: "cartesia/ink-whisper" } // Cartesia Ink-Whisper
30
+ { stt: "elevenlabs/scribe" } // ElevenLabs Scribe v2 (realtime)
31
+ { stt: "assemblyai/universal" } // AssemblyAI Universal-3
27
32
  ```
28
33
 
34
+ ## Managed vs bring-your-own-key (BYOK)
35
+
36
+ Some providers work out of the box on Pinecall's managed keys; the newer ones
37
+ require **your own API key** (saved under **Provider Keys** in the dashboard). This
38
+ split is data-driven from the rate table — see [Managed vs BYOK](/reference/managed-vs-byok)
39
+ for the full list and the live `GET /api/rates/models` query.
40
+
41
+ | STT provider | Managed (no key needed) | Notes |
42
+ |---|---|---|
43
+ | `deepgram` (flux/nova) | ✅ Yes | Default, recommended |
44
+ | `gladia` | ✅ Yes | |
45
+ | `transcribe` (AWS) | ✅ Yes | |
46
+ | `cartesia` (ink-whisper) | ❌ BYOK only | Add a Cartesia key |
47
+ | `elevenlabs` (scribe) | ❌ BYOK only | Add an ElevenLabs key |
48
+ | `assemblyai` (universal) | ❌ BYOK only | Add an AssemblyAI key |
49
+
50
+ > **BYOK enforcement:** if you configure a BYOK-only STT provider and your org has
51
+ > not saved a key for it, **agent registration is rejected** with
52
+ > `PROVIDER_KEY_REQUIRED` — Pinecall never falls back to its own key for these.
53
+ > When you bring your own key, that usage is billed by the provider directly and is
54
+ > **not** deducted from your Pinecall credits.
55
+
29
56
  ## Naming convention
30
57
 
31
58
  Configuration objects that pass through to providers keep **snake_case** to mirror what the receiving side expects (`endpointing_ms`, `interim_results`, etc.). This avoids an unnecessary translation layer and lets you copy-paste from provider docs directly.
@@ -108,6 +135,38 @@ stt: {
108
135
  }
109
136
  ```
110
137
 
138
+ ## Cartesia Ink-Whisper (BYOK)
139
+
140
+ Pairs naturally with Cartesia (Sonic) TTS for a single-vendor voice stack. Requires
141
+ your own Cartesia key.
142
+
143
+ ```typescript
144
+ stt: "cartesia/ink-whisper"
145
+ // or
146
+ stt: { provider: "cartesia", model: "ink-whisper", language: "en" }
147
+ ```
148
+
149
+ ## ElevenLabs Scribe (BYOK)
150
+
151
+ Realtime `scribe_v2_realtime`. Uses the same ElevenLabs key as ElevenLabs TTS.
152
+
153
+ ```typescript
154
+ stt: "elevenlabs/scribe"
155
+ // or
156
+ stt: { provider: "elevenlabs", model: "scribe_v2_realtime", language: "en" }
157
+ ```
158
+
159
+ ## AssemblyAI (BYOK)
160
+
161
+ Universal-3 streaming (`u3-rt-pro`) — strong accuracy + diarization. Requires your
162
+ own AssemblyAI key.
163
+
164
+ ```typescript
165
+ stt: "assemblyai/universal"
166
+ // or
167
+ stt: { provider: "assemblyai", model: "u3-rt-pro", language: "en" }
168
+ ```
169
+
111
170
  ## Which to choose
112
171
 
113
172
  | Provider | Best for | Trade-off |
@@ -116,6 +175,9 @@ stt: {
116
175
  | `deepgram/nova-3` | Arabic, Hindi, Thai, CJK, and 60+ languages | Slightly higher latency; smart_turn + silero VAD |
117
176
  | `gladia/solaria` | Code-switching, multilingual | Higher latency than Deepgram |
118
177
  | `transcribe` | AWS-native deployments | AWS pricing model |
178
+ | `cartesia/ink-whisper` | Single-vendor with Cartesia TTS | BYOK only |
179
+ | `elevenlabs/scribe` | Single-vendor with ElevenLabs TTS | BYOK only |
180
+ | `assemblyai/universal` | Accuracy + diarization | BYOK only |
119
181
 
120
182
  For most agents, start with `deepgram/flux`. Use `deepgram/nova-3` for languages Flux doesn't cover (Arabic, Hindi, Thai, Chinese, Japanese, Korean, etc.).
121
183
 
@@ -21,6 +21,22 @@ Pinecall supports multiple TTS providers. Use the `provider/friendly-id` format
21
21
 
22
22
  > The legacy `provider:rawId` format (e.g. `"elevenlabs:EXAVITQu4vr4xnSDxMaL"`) still works but is not recommended.
23
23
 
24
+ ## Managed vs bring-your-own-key (BYOK)
25
+
26
+ Data-driven from the rate table — see [Managed vs BYOK](/reference/managed-vs-byok)
27
+ for the full list and the live `GET /api/rates/models` query.
28
+
29
+ | TTS provider | Managed (no key needed) | Notes |
30
+ |---|---|---|
31
+ | `elevenlabs` | ✅ Yes | Default, recommended |
32
+ | `cartesia` (sonic) | ✅ Yes | |
33
+ | `polly` (AWS) | ✅ Yes | |
34
+ | `rime` | ❌ BYOK only | Add a Rime key under Provider Keys |
35
+
36
+ > **BYOK enforcement:** configuring `rime` without a saved Rime key rejects agent
37
+ > registration with `PROVIDER_KEY_REQUIRED`. With your own key, that usage is billed
38
+ > by the provider directly — **not** deducted from your Pinecall credits.
39
+
24
40
  ## Discovering voices
25
41
 
26
42
  Use the CLI to browse voices. Without flags, you get a catalog overview:
@@ -173,6 +189,21 @@ Shortcut: `"polly/joanna"`
173
189
  - `engine: "neural"` is required for natural-sounding output. The older `standard` engine is robotic.
174
190
  - Polly is the cheapest option but the least natural — fine for IVR-style flows, not for engaging conversation.
175
191
 
192
+ ## Rime (BYOK)
193
+
194
+ Ultra-natural, expressive English. Requires your own Rime key.
195
+
196
+ ```typescript
197
+ voice: {
198
+ provider: "rime",
199
+ voice_id: "cove", // Rime speaker id
200
+ model: "mistv2", // or "arcana" (most expressive)
201
+ speed: 1.0,
202
+ }
203
+ ```
204
+
205
+ Shortcut: `"rime/cove"`
206
+
176
207
  ## Which to choose
177
208
 
178
209
  | Provider | Best for | Trade-off |
@@ -180,6 +211,7 @@ Shortcut: `"polly/joanna"`
180
211
  | **ElevenLabs** | Most natural-sounding output | Higher cost per character |
181
212
  | **Cartesia** | Real-time streaming, low latency | Smaller voice library |
182
213
  | **Polly** | Cheap IVR, simple flows | Less natural |
214
+ | **Rime** | Ultra-natural expressive English | BYOK only; English-focused |
183
215
 
184
216
  For most agents, start with ElevenLabs (`eleven_flash_v2_5`) or Cartesia (`sonic-3`). Use Polly only for high-volume, low-engagement flows.
185
217
 
@@ -1,148 +0,0 @@
1
- ---
2
- title: "Self-Hosted LLM Gateway"
3
- description: "Consume Pinecall's hosted open model (Qwen3) for chat and structured analysis over an authenticated, plan-gated streaming endpoint."
4
- ---
5
-
6
- # Self-Hosted LLM Gateway
7
-
8
- Pinecall hosts an open LLM and exposes it through an authenticated streaming
9
- endpoint on the sdk-server. Use it for any task that wants a cheap, in-house LLM
10
- instead of a paid per-token provider: **chat / agent loops** and **structured
11
- analysis** (classification, extraction, summarization, recommendations).
12
-
13
- | Model | Size | Best for |
14
- |-------|------|----------|
15
- | `qwen3:14b` | ~9 GB | **default** — hybrid model: clean JSON/analysis with thinking off, step-by-step reasoning with thinking on |
16
- | `deepseek-r1:14b` | ~9 GB | dedicated reasoning — **coming soon** |
17
- | `qwen2.5-coder:14b` | ~9 GB | code generation, refactors, tool/JSON authoring — **coming soon** |
18
- | `mistral-nemo:12b` | ~7 GB | strong multilingual + 128k context — **coming soon** |
19
-
20
- > Models flagged **coming soon** aren't live yet — `GET /api/llm/models` always
21
- > returns the currently available set.
22
-
23
- ## Authentication & access
24
-
25
- - **Base URL:** `https://voice.pinecall.io`
26
- - **Auth:** a Pinecall API key via `X-API-Key: <key>` **or** `Authorization: Bearer <key>`.
27
- - **Plan gating:** **paid plans only** (`starter`, `pro`, `enterprise`). Both `free`
28
- and `free_trial` receive **`402 SUBSCRIPTION_REQUIRED`**.
29
-
30
- ## `POST /api/llm/chat`
31
-
32
- Streams the completion as **Server-Sent Events**.
33
-
34
- ### Request body
35
-
36
- ```jsonc
37
- {
38
- "messages": [{ "role": "user", "content": "..." }], // required
39
- "system": "optional system prompt",
40
- "model": "qwen3:14b", // default: qwen3:14b
41
- "mode": "chat" | "analysis", // default: "chat"
42
- "think": false, // reasoning on/off (default false; analysis forces false)
43
- "temperature": 0.7,
44
- "max_tokens": 512,
45
- "format": { /* JSON schema */ } | "json" // analysis mode only
46
- }
47
- ```
48
-
49
- Qwen3 is a **hybrid** model: `think: false` (the default) returns a clean, direct
50
- answer — best for JSON and low latency. `think: true` lets it reason step-by-step
51
- first (better on hard problems); the reasoning never leaks into the streamed
52
- answer. `mode: "analysis"` always forces thinking off so JSON stays clean.
53
-
54
- ### SSE event stream
55
-
56
- ```
57
- data: {"type":"token","content":"..."} // repeated — incremental text
58
- data: {"type":"done","usage":{"input_tokens":N,"output_tokens":M}}
59
- data: {"type":"error","error":"...","code":"UPSTREAM_ERROR|INTERNAL"}
60
- data: [DONE] // terminator
61
- ```
62
-
63
- ### Errors
64
-
65
- | Status | Code | Meaning |
66
- |--------|------|---------|
67
- | 401 | `MISSING_KEY` / `INVALID_KEY` | no or bad API key |
68
- | 402 | `SUBSCRIPTION_REQUIRED` | tier is `free` or `free_trial` |
69
- | 400 | `MISSING_MESSAGES` / `BAD_MODEL` / `BAD_REQUEST` | invalid request |
70
-
71
- ## `GET /api/llm/models`
72
-
73
- Same auth + gate. Returns the available models, the default, and the caller's tier —
74
- handy to probe access before streaming. **This is the source of truth for what's
75
- currently available** (the list grows over time).
76
-
77
- ```json
78
- { "models": ["qwen3:14b"], "default": "qwen3:14b", "tier": "pro" }
79
- ```
80
-
81
- ## Chat — streaming agent loop
82
-
83
- ```ts
84
- const res = await fetch("https://voice.pinecall.io/api/llm/chat", {
85
- method: "POST",
86
- headers: {
87
- "Content-Type": "application/json",
88
- "X-API-Key": process.env.PINECALL_API_KEY!,
89
- },
90
- body: JSON.stringify({
91
- model: "qwen3:14b",
92
- system: "You are a concise assistant.",
93
- messages: [{ role: "user", content: "Summarize today's bookings." }],
94
- // think: true, // ← opt into step-by-step reasoning for harder questions
95
- }),
96
- });
97
-
98
- const reader = res.body!.getReader();
99
- const dec = new TextDecoder();
100
- let buf = "";
101
- for (;;) {
102
- const { value, done } = await reader.read();
103
- if (done) break;
104
- buf += dec.decode(value, { stream: true });
105
- for (const line of buf.split("\n\n")) {
106
- if (!line.startsWith("data: ")) continue;
107
- const data = line.slice(6);
108
- if (data === "[DONE]") break;
109
- const evt = JSON.parse(data);
110
- if (evt.type === "token") process.stdout.write(evt.content);
111
- }
112
- buf = buf.slice(buf.lastIndexOf("\n\n") + 2);
113
- }
114
- ```
115
-
116
- ## Analysis — structured JSON (schema-enforced)
117
-
118
- Set `mode: "analysis"` and pass a JSON **schema** in `format`. The gateway routes
119
- analysis requests through a native path that constrains the output to your schema
120
- (and forces thinking off) — ideal for recommendations and extraction.
121
-
122
- ```ts
123
- const body = {
124
- model: "qwen3:14b",
125
- mode: "analysis",
126
- system: "You are a pricing engine. Return JSON only.",
127
- messages: [{ role: "user", content: "Service: deep-tissue massage, $80, 95% utilization, 60% margin. Recommend an optimal price." }],
128
- format: {
129
- type: "object",
130
- properties: {
131
- suggestedPrice: { type: "number" },
132
- confidence: { type: "string", enum: ["low", "medium", "high"] },
133
- rationale: { type: "string" },
134
- },
135
- required: ["suggestedPrice", "confidence", "rationale"],
136
- },
137
- };
138
- // POST as above, accumulate the `token` chunks into `text`, then:
139
- const rec = JSON.parse(text); // { suggestedPrice, confidence, rationale }
140
- ```
141
-
142
- > **Warning:** Pass a real JSON-schema **object**. The string `"json"` (OpenAI-style
143
- > `response_format`) only nudges the model toward JSON — it does **not** enforce a shape.
144
-
145
- > **Note:** This open model is for **in-app responders, analysis, and
146
- > recommendations**. For **live voice / WhatsApp agents**, the Pinecall server-side
147
- > LLM supports OpenAI / Mistral / Google / Anthropic — see
148
- > [LLM Providers](/reference/llm-providers).