@pinecall/skills 0.1.3 → 0.1.5
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/package.json +1 -1
- package/skills/pinecall-guides/SKILL.md +0 -1
- package/skills/pinecall-guides/references/guides/webrtc-browser.md +34 -0
- package/skills/pinecall-sdk-api/references/api/call.md +17 -1
- package/skills/pinecall-sdk-api/references/api/pinecall.md +14 -2
- package/skills/pinecall-guides/references/guides/self-hosted-llm.md +0 -148
package/package.json
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "@pinecall/skills",
|
|
3
|
-
"version": "0.1.
|
|
3
|
+
"version": "0.1.5",
|
|
4
4
|
"description": "Agent Skills for the Pinecall SDK — installable into Claude Code, Antigravity, Cursor, Copilot and any agent that supports the open Skills format.",
|
|
5
5
|
"type": "module",
|
|
6
6
|
"license": "MIT",
|
|
@@ -26,7 +26,6 @@ table below indexes every page; open the `references/…` file for the full text
|
|
|
26
26
|
| **Tools and Functions** | Let your agent take actions: look up data, transfer calls, book appointments. | [`references/guides/tools-and-functions.md`](references/guides/tools-and-functions.md) · [docs](https://docs.pinecall.io/guides/tools-and-functions) |
|
|
27
27
|
| **Knowledge bases (RAG)** | Tutorial — ground a voice or chat agent on your own documents with retrieval-augmented generation. | [`references/guides/knowledge-bases.md`](references/guides/knowledge-bases.md) · [docs](https://docs.pinecall.io/guides/knowledge-bases) |
|
|
28
28
|
| **Multi-Tenant Dashboards** | Host many tenants on one Pinecall instance with scoped event streams. | [`references/guides/multi-tenant.md`](references/guides/multi-tenant.md) · [docs](https://docs.pinecall.io/guides/multi-tenant) |
|
|
29
|
-
| **Self-Hosted LLM Gateway** | Consume Pinecall's hosted open model (Qwen3) for chat and structured analysis over an authenticated, plan-gated streaming endpoint. | [`references/guides/self-hosted-llm.md`](references/guides/self-hosted-llm.md) · [docs](https://docs.pinecall.io/guides/self-hosted-llm) |
|
|
30
29
|
| **SSE Event Streaming** | Stream agent events to your frontend in real time with Server-Sent Events. | [`references/guides/sse-streaming.md`](references/guides/sse-streaming.md) · [docs](https://docs.pinecall.io/guides/sse-streaming) |
|
|
31
30
|
| **WebSocket Event Streaming** | Stream agent events over WebSocket for bidirectional, real-time communication with your frontend. | [`references/guides/ws-streaming.md`](references/guides/ws-streaming.md) · [docs](https://docs.pinecall.io/guides/ws-streaming) |
|
|
32
31
|
| **Dev Mode** | Run dev and production agents on the same phone number, with zero extra Twilio cost. | [`references/guides/dev-mode.md`](references/guides/dev-mode.md) · [docs](https://docs.pinecall.io/guides/dev-mode) |
|
|
@@ -69,6 +69,40 @@ The response shape:
|
|
|
69
69
|
|
|
70
70
|
Tokens are single-use, scoped to the agent, and expire in 60 seconds. See [Security](/security) for the full security model.
|
|
71
71
|
|
|
72
|
+
### Sealed session metadata (trusted context)
|
|
73
|
+
|
|
74
|
+
Your token endpoint already knows who the user is — it's behind your auth. Bake that
|
|
75
|
+
identity into the token by passing a metadata object as the last argument to
|
|
76
|
+
`createToken`. It's **sealed into the signed token on your server**, so the browser
|
|
77
|
+
cannot forge or change it:
|
|
78
|
+
|
|
79
|
+
```typescript
|
|
80
|
+
app.get("/api/token", authMiddleware, async (req, res) => {
|
|
81
|
+
const token = await mara.createToken("webrtc", {
|
|
82
|
+
userId: req.user.id,
|
|
83
|
+
plan: req.user.plan,
|
|
84
|
+
tenantId: req.user.orgId,
|
|
85
|
+
});
|
|
86
|
+
res.json(token);
|
|
87
|
+
});
|
|
88
|
+
```
|
|
89
|
+
|
|
90
|
+
It surfaces — trusted — as `call.metadata` in your agent:
|
|
91
|
+
|
|
92
|
+
```typescript
|
|
93
|
+
agent.on("call.started", (call) => {
|
|
94
|
+
console.log(call.metadata.userId, call.metadata.plan); // straight from the sealed token
|
|
95
|
+
});
|
|
96
|
+
```
|
|
97
|
+
|
|
98
|
+
The same applies to **chat** tokens — mint with `createToken("chat", { ... })` and read
|
|
99
|
+
`call.metadata` in the chat session. (With the `Pinecall` client instead of an `Agent`
|
|
100
|
+
instance, it's `pc.createToken("webrtc", "mara", { ... })`.)
|
|
101
|
+
|
|
102
|
+
> **Trusted vs client-supplied.** The widget also accepts a `metadata` prop set in the
|
|
103
|
+
> browser — handy, but a user can forge it. For anything you'll act on (identity, plan,
|
|
104
|
+
> entitlements, tenant), seal it in the token here instead of trusting the client prop.
|
|
105
|
+
|
|
72
106
|
## 3. Drop in the widget
|
|
73
107
|
|
|
74
108
|
```bash
|
|
@@ -21,7 +21,7 @@ call.from // "+13186330963" or "sip:..."
|
|
|
21
21
|
call.to // destination number / URI
|
|
22
22
|
call.direction // "inbound" | "outbound"
|
|
23
23
|
call.transport // "phone" | "webrtc" | "chat" | "whatsapp" | "unknown"
|
|
24
|
-
call.metadata //
|
|
24
|
+
call.metadata // sealed token metadata (createToken), dial() metadata, or channel context
|
|
25
25
|
call.transcript // [{ role: "user", content: "..." }, ...] — user + assistant only
|
|
26
26
|
call.messages // full LLM history (populated on call.ended)
|
|
27
27
|
call.currentBotText // live preview of what the bot is saying (accumulated bot.word events)
|
|
@@ -31,6 +31,22 @@ call.endedAt // epoch seconds
|
|
|
31
31
|
call.reason // "hangup" | "timeout" | ...
|
|
32
32
|
```
|
|
33
33
|
|
|
34
|
+
### `metadata`
|
|
35
|
+
|
|
36
|
+
Arbitrary context attached to the session, available in `call.started` and throughout the call. It comes from one of:
|
|
37
|
+
|
|
38
|
+
- **Sealed token metadata** (browser WebRTC / chat) — passed to [`createToken(channel, agentId, metadata)`](/api/pinecall) on your server and sealed into the signed token. **Trusted**: the browser can't forge it, so it's safe for identity / plan / tenant.
|
|
39
|
+
- **`dial()` metadata** (outbound calls) — passed when you place the call.
|
|
40
|
+
- **Channel context** — provider-supplied fields for the transport.
|
|
41
|
+
|
|
42
|
+
```typescript
|
|
43
|
+
agent.on("call.started", (call) => {
|
|
44
|
+
if (call.metadata?.plan === "pro") enablePremiumTools(call);
|
|
45
|
+
});
|
|
46
|
+
```
|
|
47
|
+
|
|
48
|
+
The client-supplied `metadata` prop on the browser widget / `VoiceSession` also lands here, but is set in the browser — don't trust it for authorization; seal that in the token instead.
|
|
49
|
+
|
|
34
50
|
### `currentBotText`
|
|
35
51
|
|
|
36
52
|
A live preview of what the bot is currently saying. Built automatically from `bot.word` events — grows word-by-word as TTS plays, resets on each new `bot.speaking`, clears after `bot.finished` or `bot.interrupted`.
|
|
@@ -132,15 +132,27 @@ Unregister an agent. Returns `boolean` indicating whether the agent existed.
|
|
|
132
132
|
const removed = pc.removeAgent("mara");
|
|
133
133
|
```
|
|
134
134
|
|
|
135
|
-
### `createToken(channel, agentId)`
|
|
135
|
+
### `createToken(channel, agentId, metadata?)`
|
|
136
136
|
|
|
137
|
-
Generate a short-lived, single-use token for browser WebRTC or chat connections. Used to mint tokens for browsers.
|
|
137
|
+
Generate a short-lived, single-use token for browser **WebRTC** or **chat** connections. Used to mint tokens for browsers.
|
|
138
138
|
|
|
139
139
|
```typescript
|
|
140
140
|
const token = await pc.createToken("webrtc", "mara");
|
|
141
141
|
// { token, server, expiresIn }
|
|
142
142
|
```
|
|
143
143
|
|
|
144
|
+
**Sealed session metadata** — pass a third argument to bake trusted context into the token:
|
|
145
|
+
|
|
146
|
+
```typescript
|
|
147
|
+
const token = await pc.createToken("chat", "mara", { userId: "u_123", plan: "pro" });
|
|
148
|
+
```
|
|
149
|
+
|
|
150
|
+
The metadata is **sealed into the signed token on your server**, so the browser cannot forge or alter it. It surfaces as [`call.metadata`](/api/call) in your `call.started` handler — use it for per-user / multi-tenant context you can trust (auth identity, plan, tenant id). Works identically for `"webrtc"` and `"chat"`.
|
|
151
|
+
|
|
152
|
+
> With an `Agent` instance, use `agent.createToken(channel, metadata?)` (the `agentId` is implicit).
|
|
153
|
+
|
|
154
|
+
> ⚠️ This is **not** the client-supplied `metadata` prop on the widget / `VoiceSession` — that is set in the browser and can be forged. For anything used in authorization, seal it in the token here.
|
|
155
|
+
|
|
144
156
|
See [Security](/security) for the full token model.
|
|
145
157
|
|
|
146
158
|
### `stream(res?, options?)`
|
|
@@ -1,148 +0,0 @@
|
|
|
1
|
-
---
|
|
2
|
-
title: "Self-Hosted LLM Gateway"
|
|
3
|
-
description: "Consume Pinecall's hosted open model (Qwen3) for chat and structured analysis over an authenticated, plan-gated streaming endpoint."
|
|
4
|
-
---
|
|
5
|
-
|
|
6
|
-
# Self-Hosted LLM Gateway
|
|
7
|
-
|
|
8
|
-
Pinecall hosts an open LLM and exposes it through an authenticated streaming
|
|
9
|
-
endpoint on the sdk-server. Use it for any task that wants a cheap, in-house LLM
|
|
10
|
-
instead of a paid per-token provider: **chat / agent loops** and **structured
|
|
11
|
-
analysis** (classification, extraction, summarization, recommendations).
|
|
12
|
-
|
|
13
|
-
| Model | Size | Best for |
|
|
14
|
-
|-------|------|----------|
|
|
15
|
-
| `qwen3:14b` | ~9 GB | **default** — hybrid model: clean JSON/analysis with thinking off, step-by-step reasoning with thinking on |
|
|
16
|
-
| `deepseek-r1:14b` | ~9 GB | dedicated reasoning — **coming soon** |
|
|
17
|
-
| `qwen2.5-coder:14b` | ~9 GB | code generation, refactors, tool/JSON authoring — **coming soon** |
|
|
18
|
-
| `mistral-nemo:12b` | ~7 GB | strong multilingual + 128k context — **coming soon** |
|
|
19
|
-
|
|
20
|
-
> Models flagged **coming soon** aren't live yet — `GET /api/llm/models` always
|
|
21
|
-
> returns the currently available set.
|
|
22
|
-
|
|
23
|
-
## Authentication & access
|
|
24
|
-
|
|
25
|
-
- **Base URL:** `https://voice.pinecall.io`
|
|
26
|
-
- **Auth:** a Pinecall API key via `X-API-Key: <key>` **or** `Authorization: Bearer <key>`.
|
|
27
|
-
- **Plan gating:** **paid plans only** (`starter`, `pro`, `enterprise`). Both `free`
|
|
28
|
-
and `free_trial` receive **`402 SUBSCRIPTION_REQUIRED`**.
|
|
29
|
-
|
|
30
|
-
## `POST /api/llm/chat`
|
|
31
|
-
|
|
32
|
-
Streams the completion as **Server-Sent Events**.
|
|
33
|
-
|
|
34
|
-
### Request body
|
|
35
|
-
|
|
36
|
-
```jsonc
|
|
37
|
-
{
|
|
38
|
-
"messages": [{ "role": "user", "content": "..." }], // required
|
|
39
|
-
"system": "optional system prompt",
|
|
40
|
-
"model": "qwen3:14b", // default: qwen3:14b
|
|
41
|
-
"mode": "chat" | "analysis", // default: "chat"
|
|
42
|
-
"think": false, // reasoning on/off (default false; analysis forces false)
|
|
43
|
-
"temperature": 0.7,
|
|
44
|
-
"max_tokens": 512,
|
|
45
|
-
"format": { /* JSON schema */ } | "json" // analysis mode only
|
|
46
|
-
}
|
|
47
|
-
```
|
|
48
|
-
|
|
49
|
-
Qwen3 is a **hybrid** model: `think: false` (the default) returns a clean, direct
|
|
50
|
-
answer — best for JSON and low latency. `think: true` lets it reason step-by-step
|
|
51
|
-
first (better on hard problems); the reasoning never leaks into the streamed
|
|
52
|
-
answer. `mode: "analysis"` always forces thinking off so JSON stays clean.
|
|
53
|
-
|
|
54
|
-
### SSE event stream
|
|
55
|
-
|
|
56
|
-
```
|
|
57
|
-
data: {"type":"token","content":"..."} // repeated — incremental text
|
|
58
|
-
data: {"type":"done","usage":{"input_tokens":N,"output_tokens":M}}
|
|
59
|
-
data: {"type":"error","error":"...","code":"UPSTREAM_ERROR|INTERNAL"}
|
|
60
|
-
data: [DONE] // terminator
|
|
61
|
-
```
|
|
62
|
-
|
|
63
|
-
### Errors
|
|
64
|
-
|
|
65
|
-
| Status | Code | Meaning |
|
|
66
|
-
|--------|------|---------|
|
|
67
|
-
| 401 | `MISSING_KEY` / `INVALID_KEY` | no or bad API key |
|
|
68
|
-
| 402 | `SUBSCRIPTION_REQUIRED` | tier is `free` or `free_trial` |
|
|
69
|
-
| 400 | `MISSING_MESSAGES` / `BAD_MODEL` / `BAD_REQUEST` | invalid request |
|
|
70
|
-
|
|
71
|
-
## `GET /api/llm/models`
|
|
72
|
-
|
|
73
|
-
Same auth + gate. Returns the available models, the default, and the caller's tier —
|
|
74
|
-
handy to probe access before streaming. **This is the source of truth for what's
|
|
75
|
-
currently available** (the list grows over time).
|
|
76
|
-
|
|
77
|
-
```json
|
|
78
|
-
{ "models": ["qwen3:14b"], "default": "qwen3:14b", "tier": "pro" }
|
|
79
|
-
```
|
|
80
|
-
|
|
81
|
-
## Chat — streaming agent loop
|
|
82
|
-
|
|
83
|
-
```ts
|
|
84
|
-
const res = await fetch("https://voice.pinecall.io/api/llm/chat", {
|
|
85
|
-
method: "POST",
|
|
86
|
-
headers: {
|
|
87
|
-
"Content-Type": "application/json",
|
|
88
|
-
"X-API-Key": process.env.PINECALL_API_KEY!,
|
|
89
|
-
},
|
|
90
|
-
body: JSON.stringify({
|
|
91
|
-
model: "qwen3:14b",
|
|
92
|
-
system: "You are a concise assistant.",
|
|
93
|
-
messages: [{ role: "user", content: "Summarize today's bookings." }],
|
|
94
|
-
// think: true, // ← opt into step-by-step reasoning for harder questions
|
|
95
|
-
}),
|
|
96
|
-
});
|
|
97
|
-
|
|
98
|
-
const reader = res.body!.getReader();
|
|
99
|
-
const dec = new TextDecoder();
|
|
100
|
-
let buf = "";
|
|
101
|
-
for (;;) {
|
|
102
|
-
const { value, done } = await reader.read();
|
|
103
|
-
if (done) break;
|
|
104
|
-
buf += dec.decode(value, { stream: true });
|
|
105
|
-
for (const line of buf.split("\n\n")) {
|
|
106
|
-
if (!line.startsWith("data: ")) continue;
|
|
107
|
-
const data = line.slice(6);
|
|
108
|
-
if (data === "[DONE]") break;
|
|
109
|
-
const evt = JSON.parse(data);
|
|
110
|
-
if (evt.type === "token") process.stdout.write(evt.content);
|
|
111
|
-
}
|
|
112
|
-
buf = buf.slice(buf.lastIndexOf("\n\n") + 2);
|
|
113
|
-
}
|
|
114
|
-
```
|
|
115
|
-
|
|
116
|
-
## Analysis — structured JSON (schema-enforced)
|
|
117
|
-
|
|
118
|
-
Set `mode: "analysis"` and pass a JSON **schema** in `format`. The gateway routes
|
|
119
|
-
analysis requests through a native path that constrains the output to your schema
|
|
120
|
-
(and forces thinking off) — ideal for recommendations and extraction.
|
|
121
|
-
|
|
122
|
-
```ts
|
|
123
|
-
const body = {
|
|
124
|
-
model: "qwen3:14b",
|
|
125
|
-
mode: "analysis",
|
|
126
|
-
system: "You are a pricing engine. Return JSON only.",
|
|
127
|
-
messages: [{ role: "user", content: "Service: deep-tissue massage, $80, 95% utilization, 60% margin. Recommend an optimal price." }],
|
|
128
|
-
format: {
|
|
129
|
-
type: "object",
|
|
130
|
-
properties: {
|
|
131
|
-
suggestedPrice: { type: "number" },
|
|
132
|
-
confidence: { type: "string", enum: ["low", "medium", "high"] },
|
|
133
|
-
rationale: { type: "string" },
|
|
134
|
-
},
|
|
135
|
-
required: ["suggestedPrice", "confidence", "rationale"],
|
|
136
|
-
},
|
|
137
|
-
};
|
|
138
|
-
// POST as above, accumulate the `token` chunks into `text`, then:
|
|
139
|
-
const rec = JSON.parse(text); // { suggestedPrice, confidence, rationale }
|
|
140
|
-
```
|
|
141
|
-
|
|
142
|
-
> **Warning:** Pass a real JSON-schema **object**. The string `"json"` (OpenAI-style
|
|
143
|
-
> `response_format`) only nudges the model toward JSON — it does **not** enforce a shape.
|
|
144
|
-
|
|
145
|
-
> **Note:** This open model is for **in-app responders, analysis, and
|
|
146
|
-
> recommendations**. For **live voice / WhatsApp agents**, the Pinecall server-side
|
|
147
|
-
> LLM supports OpenAI / Mistral / Google / Anthropic — see
|
|
148
|
-
> [LLM Providers](/reference/llm-providers).
|