@pinecall/skills 0.1.7 → 0.1.9
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/package.json
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "@pinecall/skills",
|
|
3
|
-
"version": "0.1.
|
|
3
|
+
"version": "0.1.9",
|
|
4
4
|
"description": "Agent Skills for the Pinecall SDK — installable into Claude Code, Antigravity, Cursor, Copilot and any agent that supports the open Skills format.",
|
|
5
5
|
"type": "module",
|
|
6
6
|
"license": "MIT",
|
|
@@ -21,21 +21,24 @@ Every STT, TTS and LLM model on Pinecall is one of two kinds:
|
|
|
21
21
|
|
|
22
22
|
| Service | Managed providers |
|
|
23
23
|
|---|---|
|
|
24
|
-
| **STT** | `deepgram` (flux, nova-3), `gladia`, `transcribe` (AWS) |
|
|
24
|
+
| **STT** | `deepgram` (flux, nova-3), `gladia`, `transcribe` (AWS), `cartesia` (ink-whisper), `elevenlabs` (scribe) |
|
|
25
25
|
| **TTS** | `elevenlabs`, `cartesia` (sonic), `polly` (AWS) |
|
|
26
26
|
| **LLM** | `openai`, `anthropic`, `google` (gemini), `mistral` |
|
|
27
27
|
|
|
28
|
+
> **One key, both services:** an ElevenLabs (or Cartesia) key serves *both* that
|
|
29
|
+
> vendor's TTS and STT. Pinecall already holds those keys for the managed TTS, so
|
|
30
|
+
> their STT (ElevenLabs **Scribe**, Cartesia **Ink-Whisper**) is **also managed** —
|
|
31
|
+
> no key needed.
|
|
32
|
+
|
|
28
33
|
## What requires your own key (BYOK)
|
|
29
34
|
|
|
30
35
|
| Service | BYOK-only providers |
|
|
31
36
|
|---|---|
|
|
32
|
-
| **STT** | `
|
|
33
|
-
| **TTS** | `rime` |
|
|
37
|
+
| **STT** | `assemblyai`, `soniox` |
|
|
38
|
+
| **TTS** | `rime`, `soniox` |
|
|
34
39
|
| **LLM** | `xai` (grok), `groq`, `cerebras`, `deepseek`, `openrouter` |
|
|
35
40
|
|
|
36
|
-
>
|
|
37
|
-
> Cartesia **TTS** (sonic) is managed, but Cartesia **STT** (ink-whisper) is BYOK.
|
|
38
|
-
> ElevenLabs **TTS** is managed, ElevenLabs **STT** (scribe) is BYOK.
|
|
41
|
+
> `soniox` is one key for **both** STT and TTS (a Soniox key enables both).
|
|
39
42
|
|
|
40
43
|
## Check it from the API (authoritative, live)
|
|
41
44
|
|
|
@@ -55,7 +58,7 @@ curl https://playground.pinecall.io/api/rates/models
|
|
|
55
58
|
// ...
|
|
56
59
|
],
|
|
57
60
|
"managedProviders": {
|
|
58
|
-
"stt": ["deepgram", "gladia", "transcribe"],
|
|
61
|
+
"stt": ["cartesia", "deepgram", "elevenlabs", "gladia", "transcribe"],
|
|
59
62
|
"tts": ["cartesia", "elevenlabs", "polly"],
|
|
60
63
|
"llm": ["anthropic", "google", "mistral", "openai"]
|
|
61
64
|
}
|
|
@@ -29,6 +29,7 @@ Pinecall supports multiple STT providers. Use the `provider/model` format or a f
|
|
|
29
29
|
{ stt: "cartesia/ink-whisper" } // Cartesia Ink-Whisper
|
|
30
30
|
{ stt: "elevenlabs/scribe" } // ElevenLabs Scribe v2 (realtime)
|
|
31
31
|
{ stt: "assemblyai/universal" } // AssemblyAI Universal-3
|
|
32
|
+
{ stt: "soniox/realtime" } // Soniox real-time (BYOK)
|
|
32
33
|
```
|
|
33
34
|
|
|
34
35
|
## Managed vs bring-your-own-key (BYOK)
|
|
@@ -43,9 +44,10 @@ for the full list and the live `GET /api/rates/models` query.
|
|
|
43
44
|
| `deepgram` (flux/nova) | ✅ Yes | Default, recommended |
|
|
44
45
|
| `gladia` | ✅ Yes | |
|
|
45
46
|
| `transcribe` (AWS) | ✅ Yes | |
|
|
46
|
-
| `cartesia` (ink-whisper) |
|
|
47
|
-
| `elevenlabs` (scribe) |
|
|
47
|
+
| `cartesia` (ink-whisper) | ✅ Yes | Same key as Cartesia TTS — Pinecall hosts it |
|
|
48
|
+
| `elevenlabs` (scribe) | ✅ Yes | Same key as ElevenLabs TTS — Pinecall hosts it |
|
|
48
49
|
| `assemblyai` (universal) | ❌ BYOK only | Add an AssemblyAI key |
|
|
50
|
+
| `soniox` (realtime) | ❌ BYOK only | One Soniox key = STT **and** TTS |
|
|
49
51
|
|
|
50
52
|
> **BYOK enforcement:** if you configure a BYOK-only STT provider and your org has
|
|
51
53
|
> not saved a key for it, **agent registration is rejected** with
|
|
@@ -135,10 +137,11 @@ stt: {
|
|
|
135
137
|
}
|
|
136
138
|
```
|
|
137
139
|
|
|
138
|
-
## Cartesia Ink-Whisper
|
|
140
|
+
## Cartesia Ink-Whisper
|
|
139
141
|
|
|
140
|
-
Pairs naturally with Cartesia (Sonic) TTS for a single-vendor voice stack.
|
|
141
|
-
|
|
142
|
+
Pairs naturally with Cartesia (Sonic) TTS for a single-vendor voice stack.
|
|
143
|
+
**Managed** — the same Cartesia key serves TTS and STT, and Pinecall hosts it (or
|
|
144
|
+
bring your own Cartesia key to bill it directly).
|
|
142
145
|
|
|
143
146
|
```typescript
|
|
144
147
|
stt: "cartesia/ink-whisper"
|
|
@@ -146,9 +149,10 @@ stt: "cartesia/ink-whisper"
|
|
|
146
149
|
stt: { provider: "cartesia", model: "ink-whisper", language: "en" }
|
|
147
150
|
```
|
|
148
151
|
|
|
149
|
-
## ElevenLabs Scribe
|
|
152
|
+
## ElevenLabs Scribe
|
|
150
153
|
|
|
151
|
-
Realtime `scribe_v2_realtime`.
|
|
154
|
+
Realtime `scribe_v2_realtime`. **Managed** — uses the same ElevenLabs key as
|
|
155
|
+
ElevenLabs TTS, which Pinecall hosts (or bring your own ElevenLabs key).
|
|
152
156
|
|
|
153
157
|
```typescript
|
|
154
158
|
stt: "elevenlabs/scribe"
|
|
@@ -163,8 +167,8 @@ stt: {
|
|
|
163
167
|
|
|
164
168
|
## AssemblyAI (BYOK)
|
|
165
169
|
|
|
166
|
-
Universal-3 streaming (`u3-rt-pro`) — strong accuracy + diarization.
|
|
167
|
-
|
|
170
|
+
Universal-3 streaming (`u3-rt-pro`) — strong accuracy + diarization. **BYOK only** —
|
|
171
|
+
Pinecall hosts no AssemblyAI key, so add your own under Provider Keys.
|
|
168
172
|
|
|
169
173
|
```typescript
|
|
170
174
|
stt: "assemblyai/universal"
|
|
@@ -177,6 +181,17 @@ stt: {
|
|
|
177
181
|
}
|
|
178
182
|
```
|
|
179
183
|
|
|
184
|
+
## Soniox (BYOK)
|
|
185
|
+
|
|
186
|
+
Real-time multilingual STT (60+ languages). One Soniox key serves **both** Soniox
|
|
187
|
+
STT and TTS. Requires your own Soniox key.
|
|
188
|
+
|
|
189
|
+
```typescript
|
|
190
|
+
stt: "soniox/realtime"
|
|
191
|
+
// or
|
|
192
|
+
stt: { provider: "soniox", model: "stt-rt-v5", language: "en" }
|
|
193
|
+
```
|
|
194
|
+
|
|
180
195
|
## Which to choose
|
|
181
196
|
|
|
182
197
|
| Provider | Best for | Trade-off |
|
|
@@ -185,9 +200,10 @@ stt: {
|
|
|
185
200
|
| `deepgram/nova-3` | Arabic, Hindi, Thai, CJK, and 60+ languages | Slightly higher latency; smart_turn + silero VAD |
|
|
186
201
|
| `gladia/solaria` | Code-switching, multilingual | Higher latency than Deepgram |
|
|
187
202
|
| `transcribe` | AWS-native deployments | AWS pricing model |
|
|
188
|
-
| `cartesia/ink-whisper` | Single-vendor with Cartesia TTS |
|
|
189
|
-
| `elevenlabs/scribe` | Single-vendor with ElevenLabs TTS |
|
|
203
|
+
| `cartesia/ink-whisper` | Single-vendor with Cartesia TTS | Managed (shared key) |
|
|
204
|
+
| `elevenlabs/scribe` | Single-vendor with ElevenLabs TTS | Managed (shared key) |
|
|
190
205
|
| `assemblyai/universal` | Accuracy + diarization | BYOK only |
|
|
206
|
+
| `soniox/realtime` | Multilingual (60+), single-vendor with Soniox TTS | BYOK only |
|
|
191
207
|
|
|
192
208
|
For most agents, start with `deepgram/flux`. Use `deepgram/nova-3` for languages Flux doesn't cover (Arabic, Hindi, Thai, Chinese, Japanese, Korean, etc.).
|
|
193
209
|
|
|
@@ -29,9 +29,10 @@ for the full list and the live `GET /api/rates/models` query.
|
|
|
29
29
|
| TTS provider | Managed (no key needed) | Notes |
|
|
30
30
|
|---|---|---|
|
|
31
31
|
| `elevenlabs` | ✅ Yes | Default, recommended |
|
|
32
|
-
| `cartesia` (sonic) | ✅ Yes | |
|
|
32
|
+
| `cartesia` (sonic-3.5) | ✅ Yes | |
|
|
33
33
|
| `polly` (AWS) | ✅ Yes | |
|
|
34
34
|
| `rime` | ❌ BYOK only | Add a Rime key under Provider Keys |
|
|
35
|
+
| `soniox` | ❌ BYOK only | One Soniox key = TTS **and** STT |
|
|
35
36
|
|
|
36
37
|
> **BYOK enforcement:** configuring `rime` without a saved Rime key rejects agent
|
|
37
38
|
> registration with `PROVIDER_KEY_REQUIRED`. With your own key, that usage is billed
|
|
@@ -156,7 +157,7 @@ The model is part of the voice config, so it hot-reloads with it — `agent.upda
|
|
|
156
157
|
voice: {
|
|
157
158
|
provider: "cartesia",
|
|
158
159
|
voice_id: "a0e99841-438c-4a64-b679-ae501e7d6091",
|
|
159
|
-
model: "sonic-3",
|
|
160
|
+
model: "sonic-3.5", // latest; also "sonic-3" / "sonic-latest"
|
|
160
161
|
speed: 1.0,
|
|
161
162
|
volume: 1.0,
|
|
162
163
|
emotion: null,
|
|
@@ -168,7 +169,7 @@ Shortcut: `"cartesia/yumiko"`
|
|
|
168
169
|
|
|
169
170
|
**Tuning notes:**
|
|
170
171
|
|
|
171
|
-
- `model: "sonic-3"` — fastest Cartesia model, designed for streaming
|
|
172
|
+
- `model: "sonic-3.5"` — latest/fastest Cartesia model (sub-90ms, 42 languages), designed for streaming. `sonic-3` and `sonic-latest` also available.
|
|
172
173
|
- `emotion` accepts named emotion presets (check Cartesia docs for the current list)
|
|
173
174
|
|
|
174
175
|
## AWS Polly
|
|
@@ -204,6 +205,22 @@ voice: {
|
|
|
204
205
|
|
|
205
206
|
Shortcut: `"rime/cove"`
|
|
206
207
|
|
|
208
|
+
## Soniox (BYOK)
|
|
209
|
+
|
|
210
|
+
Real-time TTS in 60+ languages. One Soniox key serves **both** Soniox TTS and STT.
|
|
211
|
+
Requires your own Soniox key.
|
|
212
|
+
|
|
213
|
+
```typescript
|
|
214
|
+
voice: {
|
|
215
|
+
provider: "soniox",
|
|
216
|
+
voice_id: "Adrian", // Soniox voice name
|
|
217
|
+
model: "tts-rt-v1",
|
|
218
|
+
language: "en",
|
|
219
|
+
}
|
|
220
|
+
```
|
|
221
|
+
|
|
222
|
+
Shortcut: `"soniox/Adrian"`
|
|
223
|
+
|
|
207
224
|
## Which to choose
|
|
208
225
|
|
|
209
226
|
| Provider | Best for | Trade-off |
|
|
@@ -212,8 +229,9 @@ Shortcut: `"rime/cove"`
|
|
|
212
229
|
| **Cartesia** | Real-time streaming, low latency | Smaller voice library |
|
|
213
230
|
| **Polly** | Cheap IVR, simple flows | Less natural |
|
|
214
231
|
| **Rime** | Ultra-natural expressive English | BYOK only; English-focused |
|
|
232
|
+
| **Soniox** | Multilingual (60+), single-vendor with Soniox STT | BYOK only |
|
|
215
233
|
|
|
216
|
-
For most agents, start with ElevenLabs (`eleven_flash_v2_5`) or Cartesia (`sonic-3`). Use Polly only for high-volume, low-engagement flows.
|
|
234
|
+
For most agents, start with ElevenLabs (`eleven_flash_v2_5`) or Cartesia (`sonic-3.5`). Use Polly only for high-volume, low-engagement flows.
|
|
217
235
|
|
|
218
236
|
## Hot-reloading voices
|
|
219
237
|
|