npm - @pinecall/skills - Versions diffs - 0.1.4 → 0.1.6 - Mend

@pinecall/skills 0.1.4 → 0.1.6

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (8) hide show

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "@pinecall/skills",
-  "version": "0.1.4",
+  "version": "0.1.6",
   "description": "Agent Skills for the Pinecall SDK — installable into Claude Code, Antigravity, Cursor, Copilot and any agent that supports the open Skills format.",
   "type": "module",
   "license": "MIT",

package/skills/pinecall-guides/SKILL.md CHANGED Viewed

@@ -26,7 +26,6 @@ table below indexes every page; open the `references/…` file for the full text
 | **Tools and Functions** | Let your agent take actions: look up data, transfer calls, book appointments. | [`references/guides/tools-and-functions.md`](references/guides/tools-and-functions.md) · [docs](https://docs.pinecall.io/guides/tools-and-functions) |
 | **Knowledge bases (RAG)** | Tutorial — ground a voice or chat agent on your own documents with retrieval-augmented generation. | [`references/guides/knowledge-bases.md`](references/guides/knowledge-bases.md) · [docs](https://docs.pinecall.io/guides/knowledge-bases) |
 | **Multi-Tenant Dashboards** | Host many tenants on one Pinecall instance with scoped event streams. | [`references/guides/multi-tenant.md`](references/guides/multi-tenant.md) · [docs](https://docs.pinecall.io/guides/multi-tenant) |
-| **Self-Hosted LLM Gateway** | Consume Pinecall's hosted open model (Qwen3) for chat and structured analysis over an authenticated, plan-gated streaming endpoint. | [`references/guides/self-hosted-llm.md`](references/guides/self-hosted-llm.md) · [docs](https://docs.pinecall.io/guides/self-hosted-llm) |
 | **SSE Event Streaming** | Stream agent events to your frontend in real time with Server-Sent Events. | [`references/guides/sse-streaming.md`](references/guides/sse-streaming.md) · [docs](https://docs.pinecall.io/guides/sse-streaming) |
 | **WebSocket Event Streaming** | Stream agent events over WebSocket for bidirectional, real-time communication with your frontend. | [`references/guides/ws-streaming.md`](references/guides/ws-streaming.md) · [docs](https://docs.pinecall.io/guides/ws-streaming) |
 | **Dev Mode** | Run dev and production agents on the same phone number, with zero extra Twilio cost. | [`references/guides/dev-mode.md`](references/guides/dev-mode.md) · [docs](https://docs.pinecall.io/guides/dev-mode) |

package/skills/pinecall-reference/SKILL.md CHANGED Viewed

@@ -20,6 +20,7 @@ table below indexes every page; open the `references/…` file for the full text
 | **STT Providers** | Speech-to-text providers, models, and tuning parameters. | [`references/reference/stt-providers.md`](references/reference/stt-providers.md) · [docs](https://docs.pinecall.io/reference/stt-providers) |
 | **TTS Providers** | Text-to-speech providers, voices, and tuning parameters. | [`references/reference/tts-providers.md`](references/reference/tts-providers.md) · [docs](https://docs.pinecall.io/reference/tts-providers) |
 | **LLM Providers** | Server-side LLM providers and configuration. | [`references/reference/llm-providers.md`](references/reference/llm-providers.md) · [docs](https://docs.pinecall.io/reference/llm-providers) |
+| **Managed vs Bring-Your-Own-Key** | Which STT/TTS/LLM models Pinecall serves with its own keys, and which require yours. | [`references/reference/managed-vs-byok.md`](references/reference/managed-vs-byok.md) · [docs](https://docs.pinecall.io/reference/managed-vs-byok) |
 | **Session Limits** | Safety limits to prevent runaway sessions. | [`references/reference/session-limits.md`](references/reference/session-limits.md) · [docs](https://docs.pinecall.io/reference/session-limits) |
 | **REST API** | Static helpers for the Pinecall management API. No WebSocket needed. | [`references/reference/rest-api.md`](references/reference/rest-api.md) · [docs](https://docs.pinecall.io/reference/rest-api) |

package/skills/pinecall-reference/references/reference/llm-providers.md CHANGED Viewed

@@ -37,6 +37,28 @@ llm: "gpt-5-chat-latest"
 > The legacy `provider:model` format (e.g. `"openai:gpt-5-chat-latest"`) still works but is not recommended.
+## Managed vs bring-your-own-key (BYOK)
+Data-driven from the rate table — see [Managed vs BYOK](/reference/managed-vs-byok)
+for the full list and the live `GET /api/rates/models` query.
+| LLM provider | Managed (no key needed) | Notes |
+|---|---|---|
+| `openai` | ✅ Yes | Default, recommended |
+| `anthropic` (`claude`) | ✅ Yes | |
+| `google` (`gemini`) | ✅ Yes | |
+| `mistral` | ✅ Yes | |
+| `xai` (`grok`) | ❌ BYOK only | Add an xAI key |
+| `groq` | ❌ BYOK only | Add a Groq key |
+| `cerebras` | ❌ BYOK only | Add a Cerebras key |
+| `deepseek` | ❌ BYOK only | Add a DeepSeek key |
+| `openrouter` | ❌ BYOK only | One key → many models; model = full slug, e.g. `x-ai/grok-4` |
+> **BYOK enforcement:** configuring a BYOK-only LLM provider without a saved key for
+> it rejects agent registration with `PROVIDER_KEY_REQUIRED` — Pinecall never falls
+> back to its own key. With your own key, those tokens are billed by the provider
+> directly and are **not** deducted from your Pinecall credits.
 ## Tuning with a full config object
 For `temperature`, `max_tokens`, and other tuning parameters, use the full config object:
@@ -104,7 +126,7 @@ llm: {
 ## Google (Gemini)
 ```typescript
-llm: "google/gemini-2.0-flash"
+llm: "google/gemini-2.5-flash"
 ```
 Or with tuning:
@@ -112,7 +134,7 @@ Or with tuning:
 ```typescript
 llm: {
   provider: "google",
-  model: "gemini-2.0-flash",
+  model: "gemini-2.5-flash",
   enabled: true,
   temperature: 0.7,
   max_tokens: 512,
@@ -125,8 +147,7 @@ llm: {
 | Model | Best for |
 |---|---|
-| `gemini-2.0-flash` | Most voice agents — fast and low cost (recommended default) |
-| `gemini-2.5-flash` | Stronger reasoning at a modest cost bump |
+| `gemini-2.5-flash` | Most voice agents — fast, low cost, strong reasoning (recommended default) |
 ## Anthropic
@@ -157,6 +178,49 @@ llm: {
 > Opus is intentionally **not** offered for voice agents — it's the premium tier (too slow/costly for real-time). Sonnet 4.6 and Haiku 4.5 are the supported Anthropic models. Set your `ANTHROPIC_API_KEY` on the server (managed) or add an Anthropic credential to your org (BYOK).
+## xAI Grok (BYOK)
+```typescript
+llm: "xai/grok-4"        // "grok" is accepted as an alias for "xai"
+```
+OpenAI-compatible. Requires your own xAI key. Models: `grok-4`, `grok-4-fast`, `grok-3`.
+## Groq (BYOK)
+```typescript
+llm: "groq/llama-3.3-70b-versatile"
+```
+Fastest open-model inference. Requires your own Groq key.
+## Cerebras (BYOK)
+```typescript
+llm: "cerebras/llama-3.3-70b"
+```
+Highest tokens/sec. Requires your own Cerebras key.
+## DeepSeek (BYOK)
+```typescript
+llm: "deepseek/deepseek-chat"     // or "deepseek/deepseek-reasoner" (no tools)
+```
+Requires your own DeepSeek key.
+## OpenRouter (BYOK)
+One key unlocks hundreds of models (OpenAI, Anthropic, Google, xAI/Grok, Llama, …).
+The `model` is the **full OpenRouter slug** — it keeps its own slash:
+```typescript
+llm: { provider: "openrouter", model: "x-ai/grok-4" }
+```
+Requires your own OpenRouter key.
 ## The `enabled` field
 `enabled: false` disables server-side LLM for this agent. The server still does STT and TTS, but it won't generate responses — you handle every `turn.end` yourself with a client-side LLM.

package/skills/pinecall-reference/references/reference/managed-vs-byok.md ADDED Viewed

@@ -0,0 +1,92 @@
+---
+title: "Managed vs Bring-Your-Own-Key"
+description: "Which STT/TTS/LLM models Pinecall serves with its own keys, and which require yours."
+---
+# Managed vs Bring-Your-Own-Key (BYOK)
+Every STT, TTS and LLM model on Pinecall is one of two kinds:
+- **Managed** — Pinecall serves it with **its own provider key**. You don't add
+  anything; usage is deducted from your Pinecall **credits**.
+- **BYOK (bring your own key)** — Pinecall does **not** host a key for it. You must
+  save your **own** API key under **Provider Keys**. That usage is billed by the
+  provider **directly** and is **not** deducted from your Pinecall credits.
+> This split is **data-driven** — it comes from the Pinecall **rate table** in the
+> database (each rate has a `managed` flag), not from a hardcoded list. The tables
+> below are the current state; query the API (below) for the authoritative, live list.
+## What Pinecall provides managed (no key needed)
+| Service | Managed providers |
+|---|---|
+| **STT** | `deepgram` (flux, nova-3), `gladia`, `transcribe` (AWS) |
+| **TTS** | `elevenlabs`, `cartesia` (sonic), `polly` (AWS) |
+| **LLM** | `openai`, `anthropic`, `google` (gemini), `mistral` |
+## What requires your own key (BYOK)
+| Service | BYOK-only providers |
+|---|---|
+| **STT** | `cartesia` (ink-whisper), `elevenlabs` (scribe), `assemblyai` |
+| **TTS** | `rime` |
+| **LLM** | `xai` (grok), `groq`, `cerebras`, `deepseek`, `openrouter` |
+> Note a provider can be **managed for one service and BYOK for another** — e.g.
+> Cartesia **TTS** (sonic) is managed, but Cartesia **STT** (ink-whisper) is BYOK.
+> ElevenLabs **TTS** is managed, ElevenLabs **STT** (scribe) is BYOK.
+## Check it from the API (authoritative, live)
+The rate table is the source of truth. Query it any time:
+```bash
+curl https://playground.pinecall.io/api/rates/models
+```
+```jsonc
+{
+  "models": [
+    { "service": "stt", "provider": "deepgram",   "model": "nova-3",      "managed": true  },
+    { "service": "stt", "provider": "assemblyai",  "model": "universal",   "managed": false },
+    { "service": "llm", "provider": "xai",         "model": "grok-4",      "managed": false },
+    { "service": "tts", "provider": "rime",        "model": "mistv2",      "managed": false }
+    // ...
+  ],
+  "managedProviders": {
+    "stt": ["deepgram", "gladia", "transcribe"],
+    "tts": ["cartesia", "elevenlabs", "polly"],
+    "llm": ["anthropic", "google", "mistral", "openai"]
+  }
+}
+```
+`managed: true` → usable with no key. `managed: false` → add your own key.
+## BYOK enforcement
+If you configure a BYOK-only provider and your org has **not** saved a key for it,
+**agent registration is rejected** with code `PROVIDER_KEY_REQUIRED`:
+```
+LLM provider 'xai' requires your own API key. Pinecall does not provide a managed
+key for 'xai' — add your key under Provider Keys in the dashboard, then reconnect.
+```
+Pinecall never silently falls back to its own key for a BYOK provider.
+## Add your own key
+- **Dashboard** → **Provider Keys** → pick the provider, paste the key.
+- **API**: `PUT /api/credentials` with `{ "provider": "xai", "apiKey": "..." }`.
+One key can cover multiple services where a provider shares it — e.g. an
+**ElevenLabs** key enables both ElevenLabs TTS and ElevenLabs Scribe STT; a
+**Cartesia** key enables Sonic TTS and Ink-Whisper STT.
+## What's next
+- [STT Providers](/reference/stt-providers)
+- [TTS Providers](/reference/tts-providers)
+- [LLM Providers](/reference/llm-providers)

package/skills/pinecall-reference/references/reference/stt-providers.md CHANGED Viewed

@@ -24,8 +24,35 @@ Pinecall supports multiple STT providers. Use the `provider/model` format or a f
 // AWS Transcribe
 { stt: "transcribe" }
+// ── Bring-your-own-key only (add your key under Provider Keys first) ──
+{ stt: "cartesia/ink-whisper" }      // Cartesia Ink-Whisper
+{ stt: "elevenlabs/scribe" }         // ElevenLabs Scribe v2 (realtime)
+{ stt: "assemblyai/universal" }      // AssemblyAI Universal-3
 ```
+## Managed vs bring-your-own-key (BYOK)
+Some providers work out of the box on Pinecall's managed keys; the newer ones
+require **your own API key** (saved under **Provider Keys** in the dashboard). This
+split is data-driven from the rate table — see [Managed vs BYOK](/reference/managed-vs-byok)
+for the full list and the live `GET /api/rates/models` query.
+| STT provider | Managed (no key needed) | Notes |
+|---|---|---|
+| `deepgram` (flux/nova) | ✅ Yes | Default, recommended |
+| `gladia` | ✅ Yes | |
+| `transcribe` (AWS) | ✅ Yes | |
+| `cartesia` (ink-whisper) | ❌ BYOK only | Add a Cartesia key |
+| `elevenlabs` (scribe) | ❌ BYOK only | Add an ElevenLabs key |
+| `assemblyai` (universal) | ❌ BYOK only | Add an AssemblyAI key |
+> **BYOK enforcement:** if you configure a BYOK-only STT provider and your org has
+> not saved a key for it, **agent registration is rejected** with
+> `PROVIDER_KEY_REQUIRED` — Pinecall never falls back to its own key for these.
+> When you bring your own key, that usage is billed by the provider directly and is
+> **not** deducted from your Pinecall credits.
 ## Naming convention
 Configuration objects that pass through to providers keep **snake_case** to mirror what the receiving side expects (`endpointing_ms`, `interim_results`, etc.). This avoids an unnecessary translation layer and lets you copy-paste from provider docs directly.
@@ -108,6 +135,38 @@ stt: {
 }
 ```
+## Cartesia Ink-Whisper (BYOK)
+Pairs naturally with Cartesia (Sonic) TTS for a single-vendor voice stack. Requires
+your own Cartesia key.
+```typescript
+stt: "cartesia/ink-whisper"
+// or
+stt: { provider: "cartesia", model: "ink-whisper", language: "en" }
+```
+## ElevenLabs Scribe (BYOK)
+Realtime `scribe_v2_realtime`. Uses the same ElevenLabs key as ElevenLabs TTS.
+```typescript
+stt: "elevenlabs/scribe"
+// or
+stt: { provider: "elevenlabs", model: "scribe_v2_realtime", language: "en" }
+```
+## AssemblyAI (BYOK)
+Universal-3 streaming (`u3-rt-pro`) — strong accuracy + diarization. Requires your
+own AssemblyAI key.
+```typescript
+stt: "assemblyai/universal"
+// or
+stt: { provider: "assemblyai", model: "u3-rt-pro", language: "en" }
+```
 ## Which to choose
 | Provider | Best for | Trade-off |
@@ -116,6 +175,9 @@ stt: {
 | `deepgram/nova-3` | Arabic, Hindi, Thai, CJK, and 60+ languages | Slightly higher latency; smart_turn + silero VAD |
 | `gladia/solaria` | Code-switching, multilingual | Higher latency than Deepgram |
 | `transcribe` | AWS-native deployments | AWS pricing model |
+| `cartesia/ink-whisper` | Single-vendor with Cartesia TTS | BYOK only |
+| `elevenlabs/scribe` | Single-vendor with ElevenLabs TTS | BYOK only |
+| `assemblyai/universal` | Accuracy + diarization | BYOK only |
 For most agents, start with `deepgram/flux`. Use `deepgram/nova-3` for languages Flux doesn't cover (Arabic, Hindi, Thai, Chinese, Japanese, Korean, etc.).

package/skills/pinecall-reference/references/reference/tts-providers.md CHANGED Viewed

@@ -21,6 +21,22 @@ Pinecall supports multiple TTS providers. Use the `provider/friendly-id` format
 > The legacy `provider:rawId` format (e.g. `"elevenlabs:EXAVITQu4vr4xnSDxMaL"`) still works but is not recommended.
+## Managed vs bring-your-own-key (BYOK)
+Data-driven from the rate table — see [Managed vs BYOK](/reference/managed-vs-byok)
+for the full list and the live `GET /api/rates/models` query.
+| TTS provider | Managed (no key needed) | Notes |
+|---|---|---|
+| `elevenlabs` | ✅ Yes | Default, recommended |
+| `cartesia` (sonic) | ✅ Yes | |
+| `polly` (AWS) | ✅ Yes | |
+| `rime` | ❌ BYOK only | Add a Rime key under Provider Keys |
+> **BYOK enforcement:** configuring `rime` without a saved Rime key rejects agent
+> registration with `PROVIDER_KEY_REQUIRED`. With your own key, that usage is billed
+> by the provider directly — **not** deducted from your Pinecall credits.
 ## Discovering voices
 Use the CLI to browse voices. Without flags, you get a catalog overview:
@@ -173,6 +189,21 @@ Shortcut: `"polly/joanna"`
 - `engine: "neural"` is required for natural-sounding output. The older `standard` engine is robotic.
 - Polly is the cheapest option but the least natural — fine for IVR-style flows, not for engaging conversation.
+## Rime (BYOK)
+Ultra-natural, expressive English. Requires your own Rime key.
+```typescript
+voice: {
+  provider: "rime",
+  voice_id: "cove",      // Rime speaker id
+  model: "mistv2",        // or "arcana" (most expressive)
+  speed: 1.0,
+}
+```
+Shortcut: `"rime/cove"`
 ## Which to choose
 | Provider | Best for | Trade-off |
@@ -180,6 +211,7 @@ Shortcut: `"polly/joanna"`
 | **ElevenLabs** | Most natural-sounding output | Higher cost per character |
 | **Cartesia** | Real-time streaming, low latency | Smaller voice library |
 | **Polly** | Cheap IVR, simple flows | Less natural |
+| **Rime** | Ultra-natural expressive English | BYOK only; English-focused |
 For most agents, start with ElevenLabs (`eleven_flash_v2_5`) or Cartesia (`sonic-3`). Use Polly only for high-volume, low-engagement flows.

package/skills/pinecall-guides/references/guides/self-hosted-llm.md DELETED Viewed

@@ -1,148 +0,0 @@
----
-title: "Self-Hosted LLM Gateway"
-description: "Consume Pinecall's hosted open model (Qwen3) for chat and structured analysis over an authenticated, plan-gated streaming endpoint."
----
-# Self-Hosted LLM Gateway
-Pinecall hosts an open LLM and exposes it through an authenticated streaming
-endpoint on the sdk-server. Use it for any task that wants a cheap, in-house LLM
-instead of a paid per-token provider: **chat / agent loops** and **structured
-analysis** (classification, extraction, summarization, recommendations).
-| Model | Size | Best for |
-|-------|------|----------|
-| `qwen3:14b` | ~9 GB | **default** — hybrid model: clean JSON/analysis with thinking off, step-by-step reasoning with thinking on |
-| `deepseek-r1:14b` | ~9 GB | dedicated reasoning — **coming soon** |
-| `qwen2.5-coder:14b` | ~9 GB | code generation, refactors, tool/JSON authoring — **coming soon** |
-| `mistral-nemo:12b` | ~7 GB | strong multilingual + 128k context — **coming soon** |
-> Models flagged **coming soon** aren't live yet — `GET /api/llm/models` always
-> returns the currently available set.
-## Authentication & access
-- **Base URL:** `https://voice.pinecall.io`
-- **Auth:** a Pinecall API key via `X-API-Key: <key>` **or** `Authorization: Bearer <key>`.
-- **Plan gating:** **paid plans only** (`starter`, `pro`, `enterprise`). Both `free`
-  and `free_trial` receive **`402 SUBSCRIPTION_REQUIRED`**.
-## `POST /api/llm/chat`
-Streams the completion as **Server-Sent Events**.
-### Request body
-```jsonc
-{
-  "messages":    [{ "role": "user", "content": "..." }],  // required
-  "system":      "optional system prompt",
-  "model":       "qwen3:14b",                             // default: qwen3:14b
-  "mode":        "chat" | "analysis",                     // default: "chat"
-  "think":       false,                                   // reasoning on/off (default false; analysis forces false)
-  "temperature": 0.7,
-  "max_tokens":  512,
-  "format":      { /* JSON schema */ } | "json"           // analysis mode only
-}
-```
-Qwen3 is a **hybrid** model: `think: false` (the default) returns a clean, direct
-answer — best for JSON and low latency. `think: true` lets it reason step-by-step
-first (better on hard problems); the reasoning never leaks into the streamed
-answer. `mode: "analysis"` always forces thinking off so JSON stays clean.
-### SSE event stream
-```
-data: {"type":"token","content":"..."}                 // repeated — incremental text
-data: {"type":"done","usage":{"input_tokens":N,"output_tokens":M}}
-data: {"type":"error","error":"...","code":"UPSTREAM_ERROR|INTERNAL"}
-data: [DONE]                                            // terminator
-```
-### Errors
-| Status | Code | Meaning |
-|--------|------|---------|
-| 401 | `MISSING_KEY` / `INVALID_KEY` | no or bad API key |
-| 402 | `SUBSCRIPTION_REQUIRED` | tier is `free` or `free_trial` |
-| 400 | `MISSING_MESSAGES` / `BAD_MODEL` / `BAD_REQUEST` | invalid request |
-## `GET /api/llm/models`
-Same auth + gate. Returns the available models, the default, and the caller's tier —
-handy to probe access before streaming. **This is the source of truth for what's
-currently available** (the list grows over time).
-```json
-{ "models": ["qwen3:14b"], "default": "qwen3:14b", "tier": "pro" }
-```
-## Chat — streaming agent loop
-```ts
-const res = await fetch("https://voice.pinecall.io/api/llm/chat", {
-  method: "POST",
-  headers: {
-    "Content-Type": "application/json",
-    "X-API-Key": process.env.PINECALL_API_KEY!,
-  },
-  body: JSON.stringify({
-    model: "qwen3:14b",
-    system: "You are a concise assistant.",
-    messages: [{ role: "user", content: "Summarize today's bookings." }],
-    // think: true,  // ← opt into step-by-step reasoning for harder questions
-  }),
-});
-const reader = res.body!.getReader();
-const dec = new TextDecoder();
-let buf = "";
-for (;;) {
-  const { value, done } = await reader.read();
-  if (done) break;
-  buf += dec.decode(value, { stream: true });
-  for (const line of buf.split("\n\n")) {
-    if (!line.startsWith("data: ")) continue;
-    const data = line.slice(6);
-    if (data === "[DONE]") break;
-    const evt = JSON.parse(data);
-    if (evt.type === "token") process.stdout.write(evt.content);
-  }
-  buf = buf.slice(buf.lastIndexOf("\n\n") + 2);
-}
-```
-## Analysis — structured JSON (schema-enforced)
-Set `mode: "analysis"` and pass a JSON **schema** in `format`. The gateway routes
-analysis requests through a native path that constrains the output to your schema
-(and forces thinking off) — ideal for recommendations and extraction.
-```ts
-const body = {
-  model: "qwen3:14b",
-  mode: "analysis",
-  system: "You are a pricing engine. Return JSON only.",
-  messages: [{ role: "user", content: "Service: deep-tissue massage, $80, 95% utilization, 60% margin. Recommend an optimal price." }],
-  format: {
-    type: "object",
-    properties: {
-      suggestedPrice: { type: "number" },
-      confidence:     { type: "string", enum: ["low", "medium", "high"] },
-      rationale:      { type: "string" },
-    },
-    required: ["suggestedPrice", "confidence", "rationale"],
-  },
-};
-// POST as above, accumulate the `token` chunks into `text`, then:
-const rec = JSON.parse(text); // { suggestedPrice, confidence, rationale }
-```
-> **Warning:** Pass a real JSON-schema **object**. The string `"json"` (OpenAI-style
-> `response_format`) only nudges the model toward JSON — it does **not** enforce a shape.
-> **Note:** This open model is for **in-app responders, analysis, and
-> recommendations**. For **live voice / WhatsApp agents**, the Pinecall server-side
-> LLM supports OpenAI / Mistral / Google / Anthropic — see
-> [LLM Providers](/reference/llm-providers).