mohdel 0.110.0 → 0.112.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -1,6 +1,6 @@
1
1
  # Mohdel
2
2
 
3
- One Node API and one CLI for 11 LLM providers call any model with the same `answer()` shape, get tokens and per-call USD cost back, swap models by changing one string. Self-hosted: your keys, your infra, no SaaS proxy in the path.
3
+ Self-hosted LLM gateway and SDK for Node think LiteLLM, for the JS world. One `answer()` call for 11 providers; swap models by changing one string; get real per-call USD cost back on every result, with OpenTelemetry built in and process isolation when you need it. Your keys, your infra, no SaaS proxy in the path.
4
4
 
5
5
  ```bash
6
6
  npm install -g mohdel
@@ -12,15 +12,49 @@ Providers: Anthropic, OpenAI, Gemini, Mistral, Groq, xAI, Cerebras, Fireworks, D
12
12
 
13
13
  ## Why mohdel
14
14
 
15
+ - **Real numbers on every call.** Token counts and per-call USD cost computed from your own pricing catalog (`curated.json`) — not estimates, not provider-specific shapes. Bill tenants, alert on spend, reconcile invoices. See [docs/CATALOG.md](docs/CATALOG.md) for the catalog format.
15
16
  - **One interface across providers.** Same `answer()` call, same event stream, same `{ status, output, inputTokens, outputTokens, cost }` result. Switching from `anthropic/claude-sonnet-4-6` to `openai/gpt-5.4-mini` is one string change — adapter differences stay inside mohdel.
16
- - **Real numbers on every call.** Token counts and per-call USD cost computed from your own pricing catalog (`curated.json`) not estimates, not provider-specific shapes. See [docs/CATALOG.md](docs/CATALOG.md) for the catalog format.
17
+ - **Self-hosted, no vendor in the path.** API keys live in `~/.config/mohdel/`. Mohdel calls provider APIs directly; nothing routes through a third party, nothing marks up your tokens, no extra hop of availability risk.
17
18
  - **Observability without instrumentation.** OpenTelemetry spans, trace-linked logs, and OTLP metrics over one endpoint. Set `OTEL_EXPORTER_OTLP_ENDPOINT`; everything else is wired.
18
19
  - **Two integration paths, same API.** In-process factory for CLI tools, scripts, single-process services. Optional `thin-gate` subprocess for fault isolation, cross-process quota, and any-language HTTP callers — no code change to switch.
19
- - **Self-hosted, no vendor in the path.** API keys live in `~/.config/mohdel/`. Mohdel calls provider APIs directly; nothing routes through a third party.
20
+
21
+ ## How it compares
22
+
23
+ The one-paragraph version: **LiteLLM** is the closest analog but lives in
24
+ Python; **Vercel AI SDK** is an application toolkit, not an infra layer;
25
+ **OpenRouter** is the same one-API promise as a SaaS in your request path;
26
+ **raw provider SDKs** are N different shapes with no cost accounting.
27
+
28
+ | | mohdel | LiteLLM | Vercel AI SDK | OpenRouter | Raw SDKs |
29
+ |---|---|---|---|---|---|
30
+ | Runs in a Node stack natively | yes | Python service | yes | n/a (SaaS) | yes |
31
+ | Per-call USD cost on the result | yes | yes | no | yes | no |
32
+ | Self-hosted, keys never leave your infra | yes | yes | yes | no | yes |
33
+ | Provider-SDK process isolation | yes (thin-gate) | proxy only | no | n/a | no |
34
+ | OTel spans + metrics out of the box | yes | via callbacks | no | no | no |
35
+ | UI streaming helpers, structured output, agents | no — by design | no | yes | no | varies |
36
+
37
+ - **vs LiteLLM** — same core promise (unified calls, cost tracking,
38
+ self-hosted gateway), but Node-native: if your stack is JS, there's no
39
+ Python sidecar to deploy, version, and monitor. The honest gap: LiteLLM's
40
+ proxy exposes an OpenAI-compatible endpoint and admin features (virtual
41
+ keys, budgets); thin-gate speaks its own [wire protocol](PROTOCOL.md) —
42
+ callers use the JS client or implement the protocol.
43
+ - **vs Vercel AI SDK** — different layer, not a rival. The AI SDK is an
44
+ application toolkit (UI streaming, structured outputs, agent loops) with no
45
+ per-call cost, no gateway, no process isolation. Use it *above* mohdel if
46
+ you like it — mohdel is the inference primitive underneath.
47
+ - **vs OpenRouter** — the self-hosted version of the same idea. With a SaaS
48
+ router you accept their uptime, their markup, and your prompts transiting
49
+ their infra. Mohdel goes direct to providers with your keys — and ships an
50
+ `openrouter` adapter for when you want both.
51
+ - **vs raw provider SDKs** — no abstraction tax to escape later: mohdel's
52
+ envelope is flat and close to the SDKs underneath, and `cost`/`tokens`
53
+ come back normalized so you never parse five different usage shapes.
20
54
 
21
55
  ## Documentation
22
56
 
23
- - [INTEGRATION.md](INTEGRATION.md) — JS library guide (factory, client, answer options, tools, streaming, vision, errors, OTel)
57
+ - [INTEGRATION.md](INTEGRATION.md) — JS library guide (factory, client, answer options, tools, streaming, vision, transcription, errors, OTel)
24
58
  - [docs/COOKBOOK.md](docs/COOKBOOK.md) — copy-paste recipes (summarize a file, stream, swap providers, tools, vision, batch + cost)
25
59
  - [docs/CATALOG.md](docs/CATALOG.md) — `curated.json` walkthrough with worked examples
26
60
  - [docs/GLOSSARY.md](docs/GLOSSARY.md) — short definitions for envelope, thin-gate, session, creator vs provider, status, …
@@ -67,6 +101,10 @@ mo ask anthropic/claude-sonnet-4-6 --stream "write a haiku about recursion"
67
101
  # With thinking effort
68
102
  mo ask anthropic/claude-opus-4-6 --effort high "prove P != NP"
69
103
 
104
+ # Speech → text from an audio file
105
+ mo transcribe groq/whisper-large-v3-turbo meeting.mp3
106
+ mo transcribe mistral/voxtral-mini-transcribe interview.wav --language fr
107
+
70
108
  # Browse the model catalog
71
109
  mo ls # list all curated models
72
110
  mo ls --sort price # sorted by input price
@@ -101,9 +139,21 @@ All list/show commands support `--json [fields]` — bare `--json` lists availab
101
139
 
102
140
  ## Library Usage
103
141
 
104
- Two integration paths: the **client** (primary, cross-process) and the **factory** (in-process shortcut).
142
+ Two integration paths, same adapters underneath: start with the in-process **factory**; graduate to the cross-process **client** when you want gateway-grade isolation.
143
+
144
+ ### Factory — in-process (start here)
145
+
146
+ ```js
147
+ import mohdel from 'mohdel'
148
+
149
+ const mo = await mohdel()
150
+ const result = await mo.use('anthropic/claude-sonnet-4-6').answer('Hello')
151
+ console.log(result.output, result.cost)
152
+ ```
153
+
154
+ No subprocess, no setup beyond your API key. Right for CLI tools (`mo ask`), scripts, tests, and single-process services — which is most projects.
105
155
 
106
- ### Client — cross-process (recommended)
156
+ ### Client — cross-process (the production gateway)
107
157
 
108
158
  ```js
109
159
  import { call } from 'mohdel/client'
@@ -119,19 +169,7 @@ for await (const ev of call(envelope, { socketPath: '/tmp/mohdel-data.sock' }))
119
169
  }
120
170
  ```
121
171
 
122
- Requires a running `thin-gate` subprocess. See [INTEGRATION.md §Client](INTEGRATION.md#client-cross-process--primary-production-integration) for setup.
123
-
124
- ### Factory — in-process shortcut
125
-
126
- ```js
127
- import mohdel from 'mohdel'
128
-
129
- const mo = await mohdel()
130
- const result = await mo.use('anthropic/claude-sonnet-4-6').answer('Hello')
131
- console.log(result.output, result.cost)
132
- ```
133
-
134
- No subprocess; the factory runs the same session adapters inline. Right for CLI (`mo ask`), scripts, tests, single-process services.
172
+ Same API, but inference runs in a pooled subprocess behind the `thin-gate` supervisor (Rust): a crashing provider SDK can't take your service down, quota is enforced across processes, and non-JS callers can speak the same wire. Switching from factory to client is a configuration change, not a rewrite. See [INTEGRATION.md §Client](INTEGRATION.md#client-cross-process--primary-production-integration) for setup.
135
173
 
136
174
  For the full API — initialization, alias resolution, answer options, response shape, tool use, streaming, vision, error handling, OpenTelemetry, sub-path exports — see **[INTEGRATION.md](INTEGRATION.md)**.
137
175
 
@@ -61,6 +61,18 @@
61
61
  "tags": ["image"]
62
62
  },
63
63
 
64
+ "groq/whisper-large-v3-turbo": {
65
+ "_comment_f": "Transcription entry. type:'transcription' selects the speech-to-text dispatcher (multipart upload, no streaming). transcriptionPrice is per audio MINUTE — Groq's $0.04/hour ≈ $0.000667/min. Token-billed models (openai/gpt-4o-mini-transcribe) use inputPrice/outputPrice instead.",
66
+ "model": "whisper-large-v3-turbo",
67
+ "creator": "openai",
68
+ "provider": "groq",
69
+ "label": "Whisper Large v3 Turbo",
70
+ "inputFormat": ["audio"],
71
+ "type": "transcription",
72
+ "transcriptionPrice": 0.000667,
73
+ "tags": ["transcription", "fast"]
74
+ },
75
+
64
76
  "openai/gpt-5.4-mini": {
65
77
  "_comment_e": "Entry with custom rate limits. rpmLimit and tpmLimit override the provider-level defaults in providers.json. rateLimitScope:'model' means the limit is per-model; 'provider' means it joins the provider-level pool.",
66
78
  "model": "gpt-5.4-mini",
@@ -45,7 +45,7 @@
45
45
  "inputFormat": {
46
46
  "type": "array",
47
47
  "description": "Accepted input modalities. Defaults to ['text'].",
48
- "items": { "type": "string", "enum": ["text", "image", "video"] },
48
+ "items": { "type": "string", "enum": ["text", "image", "video", "audio"] },
49
49
  "minItems": 1,
50
50
  "uniqueItems": true
51
51
  },
@@ -56,9 +56,9 @@
56
56
  "sdk": { "type": "string", "description": "SDK adapter to use (some providers require this)." },
57
57
  "type": {
58
58
  "type": "string",
59
- "enum": ["model", "image"],
59
+ "enum": ["model", "image", "transcription"],
60
60
  "default": "model",
61
- "description": "'model' for chat/completion, 'image' for image generation."
61
+ "description": "'model' for chat/completion, 'image' for image generation, 'transcription' for speech-to-text."
62
62
  },
63
63
  "label": { "type": "string", "description": "Human-readable name shown in UIs." },
64
64
  "description": { "type": "string" },
@@ -137,6 +137,7 @@
137
137
  "imagePrice": { "type": "number", "minimum": 0, "description": "USD per generated image (image-type entries)." },
138
138
  "imageEndpoint": { "type": "string", "description": "Provider-side image endpoint name." },
139
139
  "imageDefaultSize": { "type": "string", "description": "Default size when envelope omits one (e.g. '1024x1024')." },
140
+ "transcriptionPrice": { "type": "number", "minimum": 0, "description": "USD per audio minute (transcription-type entries). Token-billed transcription models (OpenAI gpt-4o-*-transcribe) use inputPrice/outputPrice instead." },
140
141
 
141
142
  "rpmLimit": { "type": "integer", "minimum": 1, "description": "Requests per minute. Overrides provider default." },
142
143
  "tpmLimit": { "type": "integer", "minimum": 1, "description": "Tokens per minute. Overrides provider default." },
@@ -0,0 +1,85 @@
1
+ /**
2
+ * Send a TranscriptionEnvelope to thin-gate's `POST /v1/transcription`.
3
+ *
4
+ * Transcription is one-shot: single JSON response body, no streaming,
5
+ * no cooldown/rate-limit. `audio.fileUri` must be a `file://` or
6
+ * `data:` URI — `file://` requires that the gate's sessions share a
7
+ * filesystem with the caller; `data:` carries the bytes inline subject
8
+ * to the gate's body-size cap.
9
+ *
10
+ * @module client/call_transcription
11
+ */
12
+
13
+ import { requestUnix } from './transport.js'
14
+ import { MohdelTypedError } from '#core'
15
+
16
+ /**
17
+ * @param {import('#core/transcription.js').TranscriptionEnvelope} envelope
18
+ * @param {object} options
19
+ * @param {string} options.socketPath
20
+ * @param {AbortSignal} [options.signal]
21
+ * @param {string} [options.path] HTTP path; defaults to '/v1/transcription'
22
+ * @returns {Promise<import('#core/transcription.js').TranscriptionResult>}
23
+ */
24
+ export async function callTranscription (envelope, { socketPath, signal, path = '/v1/transcription' }) {
25
+ const res = await requestUnix({
26
+ socketPath,
27
+ path,
28
+ method: 'POST',
29
+ body: envelope,
30
+ signal
31
+ })
32
+
33
+ const body = await readAll(res)
34
+
35
+ if (res.statusCode !== 200) {
36
+ throw MohdelTypedError.fromJSON(parseErrorBody(body, res.statusCode ?? 0))
37
+ }
38
+
39
+ let parsed
40
+ try {
41
+ parsed = JSON.parse(body)
42
+ } catch (e) {
43
+ throw new MohdelTypedError(
44
+ 'thin-gate returned non-JSON transcription response',
45
+ { type: 'PROTOCOL_INVALID_EVENT', retryable: false }
46
+ )
47
+ }
48
+
49
+ if (!parsed || typeof parsed !== 'object' || parsed.status !== 'completed' || typeof parsed.text !== 'string') {
50
+ throw new MohdelTypedError(
51
+ 'thin-gate returned malformed TranscriptionResult',
52
+ { type: 'PROTOCOL_INVALID_EVENT', retryable: false }
53
+ )
54
+ }
55
+ return parsed
56
+ }
57
+
58
+ /**
59
+ * @param {AsyncIterable<Buffer|string>} stream
60
+ * @returns {Promise<string>}
61
+ */
62
+ async function readAll (stream) {
63
+ let s = ''
64
+ for await (const c of stream) s += typeof c === 'string' ? c : c.toString('utf8')
65
+ return s
66
+ }
67
+
68
+ /**
69
+ * @param {string} body
70
+ * @param {number} status
71
+ * @returns {import('#core/errors.js').TypedError}
72
+ */
73
+ function parseErrorBody (body, status) {
74
+ try {
75
+ const parsed = JSON.parse(body)
76
+ if (parsed && typeof parsed === 'object' && typeof parsed.type === 'string') {
77
+ return parsed
78
+ }
79
+ } catch {}
80
+ return {
81
+ type: 'PROTOCOL_HTTP_ERROR',
82
+ message: `thin-gate returned HTTP ${status}`,
83
+ retryable: status >= 500
84
+ }
85
+ }
@@ -4,6 +4,7 @@
4
4
  * Public surface (0.90):
5
5
  * - call(envelope, { socketPath, signal }): AsyncGenerator<Event>
6
6
  * - callImage(envelope, { socketPath, signal }): Promise<ImageResult>
7
+ * - callTranscription(envelope, { socketPath, signal }): Promise<TranscriptionResult>
7
8
  *
8
9
  * No provider SDKs are imported transitively. This module can be
9
10
  * consumed by callers that must not pull openai-node, anthropic-sdk,
@@ -14,3 +15,4 @@
14
15
 
15
16
  export { call } from './call.js'
16
17
  export { callImage } from './call_image.js'
18
+ export { callTranscription } from './call_transcription.js'
@@ -0,0 +1,70 @@
1
+ /**
2
+ * Transcription (voice → text) envelope and result.
3
+ *
4
+ * Separate call path from `CallEnvelope` / `AnswerResult`: transcription
5
+ * is a single synchronous request/response (no streaming) against a
6
+ * provider's `/audio/transcriptions` endpoint.
7
+ * Result shape: `{ status, text, language, durationSeconds, cost, timestamps }`.
8
+ *
9
+ * @module core/transcription
10
+ */
11
+
12
+ /**
13
+ * @typedef {object} AudioRef
14
+ * @property {string} fileUri
15
+ * `file://` or `data:` URI. Remote `https://` audio is not
16
+ * supported — providers require multipart upload, so the caller
17
+ * owns the download.
18
+ * @property {string} mimeType e.g. "audio/mpeg", "audio/wav".
19
+ */
20
+
21
+ /**
22
+ * @typedef {object} TranscriptionEnvelope
23
+ *
24
+ * @property {string} callId
25
+ * @property {string} authId
26
+ * @property {import('./envelope.js').Auth} auth
27
+ * @property {string} [traceparent]
28
+ * @property {string} [baggage]
29
+ *
30
+ * @property {import('./model-id.js').ModelId} model
31
+ * Full mohdel id — `"<provider>/<bare>"`. Same shape as
32
+ * `CallEnvelope.model` (see `envelope.js`).
33
+ * @property {AudioRef} audio
34
+ *
35
+ * @property {string} [language] ISO-639-1 hint (e.g. "en", "fr").
36
+ * @property {string} [prompt] Spelling/context hint forwarded to the provider.
37
+ */
38
+
39
+ /**
40
+ * @typedef {object} TranscriptionResult
41
+ *
42
+ * @property {'completed'} status
43
+ * Transcriptions are one-shot — no `incomplete` state.
44
+ * @property {string} text
45
+ * @property {string | null} language
46
+ * Detected (or echoed) language when the provider reports one.
47
+ * @property {number | null} durationSeconds
48
+ * Audio duration as reported by the provider; null when not reported.
49
+ * @property {number} [inputTokens]
50
+ * Present only for token-billed providers (OpenAI gpt-4o-*-transcribe).
51
+ * @property {number} [outputTokens]
52
+ * @property {number} cost
53
+ * USD. `transcriptionPrice` (per audio minute) × duration when the
54
+ * provider reports duration; token pricing fallback otherwise; 0 when
55
+ * the spec carries no usable price.
56
+ * @property {{start: string, first: string, end: string}} timestamps
57
+ * hrtime-bigint-as-string. `first` = `end` (no streaming).
58
+ */
59
+
60
+ export const TRANSCRIPTION_ENVELOPE_FIELDS = Object.freeze([
61
+ 'callId',
62
+ 'authId',
63
+ 'auth',
64
+ 'traceparent',
65
+ 'baggage',
66
+ 'model',
67
+ 'audio',
68
+ 'language',
69
+ 'prompt'
70
+ ])
@@ -22,6 +22,7 @@
22
22
 
23
23
  import { run } from '../session/run.js'
24
24
  import { runImage } from '../session/run_image.js'
25
+ import { runTranscription } from '../session/run_transcription.js'
25
26
  import { MohdelError, Severity } from '../../src/lib/errors.js'
26
27
  import { createRealtimeDeltaBuffer } from '../../src/lib/utils.js'
27
28
 
@@ -149,6 +150,39 @@ export async function runAnswerImage ({ provider, model, configuration, prompt,
149
150
  return out.result
150
151
  }
151
152
 
153
+ /**
154
+ * Run a `transcribe()` call through the /session runtime.
155
+ *
156
+ * @param {object} args
157
+ * @param {string} args.provider
158
+ * @param {string} args.model
159
+ * @param {any} args.configuration
160
+ * @param {{fileUri: string, mimeType: string}} args.audio
161
+ * @param {any} [args.options] `language` / `prompt` map onto the
162
+ * envelope; `callId` / `authId` are
163
+ * transport metadata.
164
+ * @param {any} [args.spec] modelSpec passthrough so the adapter
165
+ * picks up `model` and
166
+ * `transcriptionPrice` without
167
+ * re-reading the catalog.
168
+ * @returns {Promise<any>}
169
+ */
170
+ export async function runAnswerTranscription ({ provider, model, configuration, audio, options = {}, spec }) {
171
+ const envelope = {
172
+ callId: options.callId || newCallId(),
173
+ authId: options.authId || 'local',
174
+ auth: configToAuth(configuration),
175
+ model: `${provider}/${model}`,
176
+ audio
177
+ }
178
+ if (options.language) envelope.language = options.language
179
+ if (options.prompt) envelope.prompt = options.prompt
180
+
181
+ const out = await runTranscription(envelope, spec ? { spec } : {})
182
+ if (!out.ok) throw fromTypedError(out.error, { provider, model })
183
+ return out.result
184
+ }
185
+
152
186
  /**
153
187
  * @param {object} args
154
188
  * @param {string} args.modelKey Mohdel catalog key `<provider>/<bare>`. The
@@ -100,6 +100,38 @@ export function costFor (model, usage) {
100
100
  return computeCost(getSpec(model), usage)
101
101
  }
102
102
 
103
+ /**
104
+ * Cost of a transcription call.
105
+ *
106
+ * Providers bill speech-to-text two ways, and the catalog supports both:
107
+ *
108
+ * - `transcriptionPrice` — flat USD per audio **minute** (Groq,
109
+ * Mistral; the industry quoting unit). Used when the provider
110
+ * reported the audio duration.
111
+ * - token pricing (`inputPrice`/`outputPrice`) — OpenAI's
112
+ * gpt-4o-*-transcribe models report token usage instead of
113
+ * duration; falls through to `computeCost`.
114
+ *
115
+ * Duration wins when both are available. Unknown models or specs
116
+ * without prices return `0` — same graceful degradation as
117
+ * `computeCost`.
118
+ *
119
+ * @param {any} spec Catalog entry, or `undefined`.
120
+ * @param {{durationSeconds?: number | null, inputTokens?: number, outputTokens?: number}} usage
121
+ * @returns {number}
122
+ */
123
+ export function computeTranscriptionCost (spec, usage) {
124
+ if (!spec) return 0
125
+ const seconds = usage.durationSeconds
126
+ if (typeof seconds === 'number' && seconds > 0 && typeof spec.transcriptionPrice === 'number') {
127
+ return round((seconds / 60) * spec.transcriptionPrice)
128
+ }
129
+ if (usage.inputTokens || usage.outputTokens) {
130
+ return computeCost(spec, { inputTokens: usage.inputTokens, outputTokens: usage.outputTokens })
131
+ }
132
+ return 0
133
+ }
134
+
103
135
  /**
104
136
  * Test convenience: inject pricing-only specs by model id. Wraps
105
137
  * `setCatalog` with the `{input, output, thinking?}` shape used in
@@ -0,0 +1,53 @@
1
+ /**
2
+ * Fake transcription adapter — scenario-driven for tests and bug
3
+ * reproductions. Never calls a real API.
4
+ *
5
+ * Mirrors the `fake` image adapter shape: the envelope's `prompt`
6
+ * field carries a JSON scenario spec; the `mode` key picks a
7
+ * behavior. Missing / non-JSON prompts fall through to `mode: "ok"`.
8
+ *
9
+ * ## Modes
10
+ *
11
+ * | mode | params | behavior |
12
+ * |---------|---------------------------------|--------------------------------|
13
+ * | `ok` | `text?`, `durationSeconds?` | returns a canned transcription |
14
+ * | `error` | `type`, `message` | throws a tagged error |
15
+ *
16
+ * @module session/adapters/transcription/fake
17
+ */
18
+
19
+ /**
20
+ * @param {import('#core/transcription.js').TranscriptionEnvelope} envelope
21
+ * @returns {Promise<import('#core/transcription.js').TranscriptionResult>}
22
+ */
23
+ export async function fakeTranscription (envelope) {
24
+ const scenario = parseScenario(envelope.prompt)
25
+ const mode = scenario.mode ?? 'ok'
26
+
27
+ if (mode === 'error') {
28
+ const err = new Error(scenario.message || 'fake transcription error')
29
+ err.typed = {
30
+ message: scenario.message || 'fake transcription error',
31
+ severity: 'error',
32
+ retryable: !!scenario.retryable,
33
+ type: scenario.type || 'PROVIDER_ERROR'
34
+ }
35
+ throw err
36
+ }
37
+
38
+ const now = `${process.hrtime.bigint()}`
39
+ return {
40
+ status: 'completed',
41
+ text: scenario.text ?? `fake transcript for ${envelope.callId}`,
42
+ language: scenario.language ?? 'en',
43
+ durationSeconds: scenario.durationSeconds ?? 1,
44
+ cost: 0,
45
+ timestamps: { start: now, first: now, end: now }
46
+ }
47
+ }
48
+
49
+ /** @param {unknown} prompt */
50
+ function parseScenario (prompt) {
51
+ if (typeof prompt !== 'string') return {}
52
+ try { return JSON.parse(prompt) || {} } catch { return {} }
53
+ }
@@ -0,0 +1,58 @@
1
+ /**
2
+ * Transcription-adapter registry. Mirrors session/adapters/image but
3
+ * scoped to speech-to-text providers.
4
+ *
5
+ * Groq, OpenAI, and Mistral all expose the same OpenAI-compatible
6
+ * `POST /audio/transcriptions` multipart endpoint, so each entry is
7
+ * the shared adapter bound to per-provider knobs:
8
+ *
9
+ * - `baseURL` — the provider's OpenAI-compatible API root.
10
+ * - `responseFormat` — `verbose_json` where supported (returns
11
+ * `duration`, needed for per-minute pricing). OpenAI's
12
+ * gpt-4o-*-transcribe models reject `verbose_json` (plain `json`
13
+ * returns token usage instead); Mistral rejects the field
14
+ * entirely (its default response already carries
15
+ * `usage.prompt_audio_seconds`).
16
+ *
17
+ * @module session/adapters/transcription
18
+ */
19
+
20
+ import { createTranscriptionAdapter } from './openai_compatible.js'
21
+ import { fakeTranscription } from './fake.js'
22
+
23
+ const TRANSCRIPTION_ADAPTERS = {
24
+ groq: createTranscriptionAdapter({
25
+ baseURL: 'https://api.groq.com/openai/v1',
26
+ responseFormat: 'verbose_json'
27
+ }),
28
+ openai: createTranscriptionAdapter({
29
+ baseURL: 'https://api.openai.com/v1',
30
+ responseFormat: 'json'
31
+ }),
32
+ mistral: createTranscriptionAdapter({
33
+ baseURL: 'https://api.mistral.ai/v1'
34
+ }),
35
+ fake: fakeTranscription
36
+ }
37
+
38
+ /**
39
+ * @param {string} provider
40
+ * @returns {(
41
+ * env: import('#core/transcription.js').TranscriptionEnvelope,
42
+ * deps?: any
43
+ * ) => Promise<import('#core/transcription.js').TranscriptionResult>}
44
+ */
45
+ export function getTranscriptionAdapter (provider) {
46
+ const adapter = TRANSCRIPTION_ADAPTERS[provider]
47
+ if (!adapter) throw new Error(`no transcription adapter for provider: ${provider}`)
48
+ return adapter
49
+ }
50
+
51
+ /**
52
+ * Whether the provider has a transcription adapter registered.
53
+ *
54
+ * @param {string} provider
55
+ */
56
+ export function isTranscriptionProvider (provider) {
57
+ return Object.prototype.hasOwnProperty.call(TRANSCRIPTION_ADAPTERS, provider)
58
+ }
@@ -0,0 +1,177 @@
1
+ /**
2
+ * Shared transcription adapter for OpenAI-compatible
3
+ * `POST <baseURL>/audio/transcriptions` endpoints (multipart upload).
4
+ *
5
+ * Groq, Mistral, and OpenAI all implement the same endpoint shape;
6
+ * only the base URL and the supported `response_format` differ, so
7
+ * one adapter covers all three. Per-provider knobs are bound via
8
+ * `createTranscriptionAdapter` in `./index.js`.
9
+ *
10
+ * Duration extraction (for per-minute pricing) is response-shape
11
+ * dependent:
12
+ * - `body.duration` — whisper `verbose_json` (Groq)
13
+ * - `body.usage.seconds` — OpenAI duration-type usage
14
+ * - `body.usage.prompt_audio_seconds` — Mistral Voxtral
15
+ * OpenAI's gpt-4o-*-transcribe models report token usage instead;
16
+ * `computeTranscriptionCost` falls back to token pricing for those.
17
+ *
18
+ * @module session/adapters/transcription/openai_compatible
19
+ */
20
+
21
+ import { readFile } from 'node:fs/promises'
22
+ import { basename } from 'node:path'
23
+
24
+ import { getSpec } from '../_catalog.js'
25
+ import { classifyProviderError } from '../_errors.js'
26
+ import { computeTranscriptionCost } from '../_pricing.js'
27
+ import { catalogKey, bareOf } from '#core/model-id.js'
28
+
29
+ /**
30
+ * @param {{baseURL: string, responseFormat?: string}} config
31
+ * @returns {(
32
+ * env: import('#core/transcription.js').TranscriptionEnvelope,
33
+ * deps?: {fetch?: typeof fetch, spec?: any}
34
+ * ) => Promise<import('#core/transcription.js').TranscriptionResult>}
35
+ */
36
+ export function createTranscriptionAdapter ({ baseURL, responseFormat }) {
37
+ return async function transcription (envelope, deps = {}) {
38
+ const fetchFn = deps.fetch ?? globalThis.fetch
39
+ const spec = deps.spec ?? getSpec(catalogKey(envelope.model)) ?? {}
40
+ const start = String(process.hrtime.bigint())
41
+
42
+ const audio = await loadAudio(envelope.audio)
43
+
44
+ const form = new FormData()
45
+ form.append('model', spec.model ?? bareOf(envelope.model))
46
+ form.append('file', new Blob([audio.bytes], { type: audio.mimeType }), audio.filename)
47
+ if (responseFormat) form.append('response_format', responseFormat)
48
+ if (envelope.language) form.append('language', envelope.language)
49
+ if (envelope.prompt) form.append('prompt', envelope.prompt)
50
+
51
+ const root = (envelope.auth.baseURL || baseURL).replace(/\/$/, '')
52
+ let res
53
+ try {
54
+ res = await fetchFn(`${root}/audio/transcriptions`, {
55
+ method: 'POST',
56
+ headers: { Authorization: `Bearer ${envelope.auth.key}` },
57
+ body: form
58
+ })
59
+ } catch (e) {
60
+ throw typedError(classifyProviderError(e, envelope.auth?.key).message, 'NET_ERROR', true)
61
+ }
62
+ if (!res.ok) {
63
+ const text = await res.text().catch(() => '')
64
+ throw fromHttpStatus(res.status, 'transcription request failed', text.slice(0, 200))
65
+ }
66
+
67
+ const body = await res.json()
68
+ const durationSeconds = extractDuration(body)
69
+ const tokens = extractTokens(body)
70
+ const cost = computeTranscriptionCost(spec, { durationSeconds, ...tokens })
71
+
72
+ const end = String(process.hrtime.bigint())
73
+ return {
74
+ status: 'completed',
75
+ text: typeof body.text === 'string' ? body.text : '',
76
+ language: typeof body.language === 'string' ? body.language : null,
77
+ durationSeconds,
78
+ ...tokens,
79
+ cost,
80
+ timestamps: { start, first: end, end }
81
+ }
82
+ }
83
+ }
84
+
85
+ /** @param {any} body */
86
+ function extractDuration (body) {
87
+ if (typeof body.duration === 'number') return body.duration
88
+ const u = body.usage
89
+ if (u && typeof u === 'object') {
90
+ if (typeof u.seconds === 'number') return u.seconds
91
+ if (typeof u.prompt_audio_seconds === 'number') return u.prompt_audio_seconds
92
+ }
93
+ return null
94
+ }
95
+
96
+ /** @param {any} body */
97
+ function extractTokens (body) {
98
+ const u = body.usage
99
+ if (!u || typeof u !== 'object') return {}
100
+ const out = {}
101
+ if (typeof u.input_tokens === 'number') out.inputTokens = u.input_tokens
102
+ if (typeof u.output_tokens === 'number') out.outputTokens = u.output_tokens
103
+ return out
104
+ }
105
+
106
+ // Multipart filename drives format sniffing on the provider side, so
107
+ // data: URIs need an extension synthesized from the MIME subtype.
108
+ const EXT_BY_MIME = {
109
+ 'audio/mpeg': 'mp3',
110
+ 'audio/mp4': 'm4a',
111
+ 'audio/x-m4a': 'm4a',
112
+ 'audio/wav': 'wav',
113
+ 'audio/x-wav': 'wav',
114
+ 'audio/webm': 'webm',
115
+ 'audio/flac': 'flac',
116
+ 'audio/x-flac': 'flac',
117
+ 'audio/ogg': 'ogg',
118
+ 'audio/opus': 'opus'
119
+ }
120
+
121
+ /**
122
+ * `file://` and `data:` URIs only — providers require multipart
123
+ * upload, so remote `https://` audio would mean mohdel silently
124
+ * downloading arbitrary URLs; the caller owns that step.
125
+ *
126
+ * @param {import('#core/transcription.js').AudioRef} audio
127
+ * @returns {Promise<{bytes: Buffer, mimeType: string, filename: string}>}
128
+ */
129
+ export async function loadAudio (audio) {
130
+ if (!audio?.fileUri || !audio?.mimeType) {
131
+ throw typedError('transcription requires audio {fileUri, mimeType}', 'SESSION_INVALID_AUDIO', false)
132
+ }
133
+ const { fileUri, mimeType } = audio
134
+ if (fileUri.startsWith('file://')) {
135
+ const path = fileUri.replace(/^file:\/\//, '')
136
+ let bytes
137
+ try {
138
+ bytes = await readFile(path)
139
+ } catch (e) {
140
+ throw typedError('audio file unreadable', 'SESSION_INVALID_AUDIO', false, messageOf(e))
141
+ }
142
+ return { bytes, mimeType, filename: basename(path) }
143
+ }
144
+ if (fileUri.startsWith('data:')) {
145
+ const parts = fileUri.split(',')
146
+ if (parts.length < 2) {
147
+ throw typedError('malformed audio data URI', 'SESSION_INVALID_AUDIO', false)
148
+ }
149
+ const ext = EXT_BY_MIME[mimeType] || mimeType.split('/').pop() || 'bin'
150
+ return { bytes: Buffer.from(parts[1], 'base64'), mimeType, filename: `audio.${ext}` }
151
+ }
152
+ throw typedError(
153
+ `unsupported audio URI scheme: ${fileUri.slice(0, 32)}…`,
154
+ 'SESSION_INVALID_AUDIO',
155
+ false
156
+ )
157
+ }
158
+
159
+ function fromHttpStatus (status, message, detail) {
160
+ const typed = classifyProviderError({ status })
161
+ // Keep the classifier's message (stable/machine-readable); response
162
+ // body snippets go to `detail` only (F45).
163
+ return typedError(typed.message, typed.type, typed.retryable, detail ? `${message}: ${detail}` : message)
164
+ }
165
+
166
+ function typedError (message, type, retryable, detail) {
167
+ const err = new Error(message)
168
+ const typed = { message, severity: retryable ? 'warn' : 'error', retryable, type }
169
+ if (detail) typed.detail = detail
170
+ err.typed = typed
171
+ return err
172
+ }
173
+
174
+ /** @param {unknown} e */
175
+ function messageOf (e) {
176
+ return e instanceof Error ? e.message : String(e)
177
+ }
@@ -16,6 +16,7 @@ import readline from 'node:readline'
16
16
 
17
17
  import { run } from './run.js'
18
18
  import { runImage } from './run_image.js'
19
+ import { runTranscription } from './run_transcription.js'
19
20
  import { setCatalog } from './adapters/_catalog.js'
20
21
 
21
22
  // Bounded memory for pre-dequeue cancels. Hostile/buggy supervisors
@@ -148,6 +149,16 @@ export async function drive (stdin, stdout) {
148
149
  } else {
149
150
  stdout.write(JSON.stringify({ type: 'error', error: out.error }) + '\n')
150
151
  }
152
+ } else if (envelope.op === 'transcription') {
153
+ // Same one-shot contract as the image path; shape matches
154
+ // `js/core/transcription.js` after the tag strip.
155
+ const { op: _op, ...trEnv } = envelope
156
+ const out = await runTranscription(trEnv)
157
+ if (out.ok) {
158
+ stdout.write(JSON.stringify({ type: 'transcription_done', result: out.result }) + '\n')
159
+ } else {
160
+ stdout.write(JSON.stringify({ type: 'error', error: out.error }) + '\n')
161
+ }
151
162
  } else {
152
163
  for await (const ev of run(envelope, { signal: controller.signal })) {
153
164
  stdout.write(JSON.stringify(ev) + '\n')
@@ -0,0 +1,64 @@
1
+ /**
2
+ * Dispatch a TranscriptionEnvelope to the matching transcription
3
+ * adapter.
4
+ *
5
+ * Transcription is a single request/response — no streaming — so
6
+ * this returns a Promise of `TranscriptionResult` rather than an
7
+ * event generator. On adapter failure, the resolved error is a
8
+ * `TypedError` (structured, serializable) rather than a thrown JS
9
+ * `Error`.
10
+ *
11
+ * Like the image path, transcription skips rate-limit and cooldown —
12
+ * low-frequency one-shots that don't justify the per-call tracking
13
+ * overhead.
14
+ *
15
+ * @module session/run_transcription
16
+ */
17
+
18
+ import { getTranscriptionAdapter } from './adapters/transcription/index.js'
19
+ import { classifyProviderError } from './adapters/_errors.js'
20
+ import { providerOf } from '#core/model-id.js'
21
+
22
+ /**
23
+ * @param {import('#core/transcription.js').TranscriptionEnvelope} envelope
24
+ * @param {{
25
+ * resolveAdapter?: (provider: string) => (
26
+ * env: import('#core/transcription.js').TranscriptionEnvelope,
27
+ * deps?: any
28
+ * ) => Promise<import('#core/transcription.js').TranscriptionResult>,
29
+ * spec?: any
30
+ * }} [options]
31
+ * @returns {Promise<
32
+ * | {ok: true, result: import('#core/transcription.js').TranscriptionResult}
33
+ * | {ok: false, error: import('#core/errors.js').TypedError}
34
+ * >}
35
+ */
36
+ export async function runTranscription (envelope, { resolveAdapter = getTranscriptionAdapter, spec } = {}) {
37
+ let adapter
38
+ try {
39
+ adapter = resolveAdapter(providerOf(envelope.model))
40
+ } catch (e) {
41
+ return {
42
+ ok: false,
43
+ error: {
44
+ message: messageOf(e),
45
+ severity: 'error',
46
+ retryable: false,
47
+ type: 'SESSION_UNKNOWN_PROVIDER'
48
+ }
49
+ }
50
+ }
51
+
52
+ try {
53
+ const result = await adapter(envelope, spec ? { spec } : {})
54
+ return { ok: true, result }
55
+ } catch (e) {
56
+ const typed = /** @type {any} */(e).typed || classifyProviderError(e, envelope.auth?.key)
57
+ return { ok: false, error: typed }
58
+ }
59
+ }
60
+
61
+ /** @param {unknown} e */
62
+ function messageOf (e) {
63
+ return e instanceof Error ? e.message : String(e)
64
+ }
package/package.json CHANGED
@@ -1,12 +1,12 @@
1
1
  {
2
2
  "name": "mohdel",
3
- "version": "0.110.0",
3
+ "version": "0.112.0",
4
4
  "license": "MIT",
5
5
  "author": {
6
6
  "name": "Christophe Le Bars",
7
7
  "email": "clb@toort.net"
8
8
  },
9
- "description": "Self-hosted LLM gateway with an embeddable SDK. Process-isolated, OpenTelemetry-native inference across 11 providers streaming, tools, thinking control without orchestration. Use the Node factory in-process, or run thin-gate for fault isolation and any-language HTTP callers.",
9
+ "description": "Self-hosted LLM gateway and SDK for Node a LiteLLM-style unified API for 11 providers (Anthropic, OpenAI, Gemini, Mistral, Groq, xAI, DeepSeek, OpenRouter, …) with per-call USD cost tracking, streaming, tool calls, vision, speech-to-text, and built-in OpenTelemetry. Run in-process, or behind the process-isolated thin-gate for fault containment.",
10
10
  "type": "module",
11
11
  "repository": {
12
12
  "type": "git",
@@ -87,10 +87,10 @@
87
87
  "@opentelemetry/exporter-trace-otlp-grpc": "^0.218.0",
88
88
  "@opentelemetry/sdk-node": "^0.218.0",
89
89
  "chalk": "^5.4.0",
90
- "mohdel-thin-gate-linux-x64-gnu": "0.110.0"
90
+ "mohdel-thin-gate-linux-x64-gnu": "0.112.0"
91
91
  },
92
92
  "dependencies": {
93
- "@anthropic-ai/sdk": "^0.102.0",
93
+ "@anthropic-ai/sdk": "^0.104.1",
94
94
  "@cerebras/cerebras_cloud_sdk": "^1.61.1",
95
95
  "@google/genai": "^2.8.0",
96
96
  "@opentelemetry/api": "^1.9.1",
package/src/cli/ask.js CHANGED
@@ -5,8 +5,8 @@ const noop = () => {}
5
5
 
6
6
  // Friendly next-step hints for common ask-time failures. Pure pattern match on
7
7
  // err.message — keeps the lib layer neutral, but gives CLI users a copy-pasteable
8
- // command instead of just an error.
9
- const hintsForError = (err, modelId) => {
8
+ // command instead of just an error. Shared with `mo transcribe`.
9
+ export const hintsForError = (err, modelId) => {
10
10
  const msg = String(err?.message || '')
11
11
  const detail = String(err?.detail || '')
12
12
  const both = `${msg}\n${detail}`
package/src/cli/index.js CHANGED
@@ -55,6 +55,7 @@ Commands:
55
55
  ratelimit provider rm <p> Remove provider-level limits
56
56
 
57
57
  ask <provider/model> [prompt] One-shot inference (pipeable)
58
+ transcribe <provider/model> <file> Speech → text from an audio file
58
59
 
59
60
  default Set default model (interactive)
60
61
  doctor Check that your install is wired up
@@ -133,6 +134,9 @@ if (resolved === 'default') {
133
134
  } else if (resolved === 'ask') {
134
135
  const { runAsk } = await import('./ask.js')
135
136
  await runAsk(resolvedArgs)
137
+ } else if (resolved === 'transcribe') {
138
+ const { runTranscribe } = await import('./transcribe.js')
139
+ await runTranscribe(resolvedArgs)
136
140
  } else if (resolved === 'model') {
137
141
  const { runModel } = await import('./model.js')
138
142
  await runModel(resolvedArgs)
package/src/cli/model.js CHANGED
@@ -269,7 +269,7 @@ Examples:
269
269
  Required fields (asked if not pre-filled):
270
270
  model the literal id sent to the provider's API
271
271
  creator who trained the model (e.g. anthropic, openai, alibaba)
272
- inputFormat subset of [text, image, video]
272
+ inputFormat subset of [text, image, video, audio]
273
273
 
274
274
  See docs/CATALOG.md for the full field reference, and
275
275
  config/curated.example.json for ready-to-copy entries.`)
@@ -0,0 +1,145 @@
1
+ import { resolve, extname } from 'node:path'
2
+
3
+ import mohdel, { silent } from '../lib/index.js'
4
+ import { loadDefaultEnv } from '../lib/common.js'
5
+ import { hintsForError } from './ask.js'
6
+
7
+ const noop = () => {}
8
+
9
+ const MIME_BY_EXT = {
10
+ '.mp3': 'audio/mpeg',
11
+ '.mpga': 'audio/mpeg',
12
+ '.m4a': 'audio/mp4',
13
+ '.mp4': 'audio/mp4',
14
+ '.wav': 'audio/wav',
15
+ '.webm': 'audio/webm',
16
+ '.flac': 'audio/flac',
17
+ '.ogg': 'audio/ogg',
18
+ '.opus': 'audio/opus'
19
+ }
20
+
21
+ export async function runTranscribe (args) {
22
+ if (args.includes('-h') || args.includes('--help')) {
23
+ console.log(`mohdel transcribe — speech → text, pipeable
24
+
25
+ Usage:
26
+ mo transcribe <model> <audio-file>
27
+
28
+ Options:
29
+ --language <iso> ISO-639-1 language hint (e.g. en, fr)
30
+ --prompt <text> Spelling/context hint forwarded to the provider
31
+ --mime <type> Override the MIME type guessed from the extension
32
+ --json Output full result as JSON
33
+ -v, --verbose Show debug info on stderr
34
+
35
+ Output:
36
+ stdout: transcript text (raw — or JSON with --json)
37
+ stderr: model name + duration/cost summary
38
+
39
+ Known extensions: ${Object.keys(MIME_BY_EXT).join(' ')}
40
+
41
+ Examples:
42
+ mo transcribe groq/whisper-large-v3-turbo meeting.mp3
43
+ mo transcribe mistral/voxtral-mini-transcribe interview.wav --language fr
44
+ mo transcribe groq/whisper-large-v3-turbo memo.m4a --json | jq .cost`)
45
+ process.exit(0)
46
+ }
47
+
48
+ loadDefaultEnv()
49
+
50
+ const flagVal = (name) => {
51
+ const idx = args.indexOf(name)
52
+ if (idx === -1) return undefined
53
+ const val = args[idx + 1]
54
+ args.splice(idx, 2)
55
+ return val
56
+ }
57
+ const flag = (name) => {
58
+ const idx = args.indexOf(name)
59
+ if (idx === -1) return false
60
+ args.splice(idx, 1)
61
+ return true
62
+ }
63
+
64
+ const json = flag('--json')
65
+ const verbose = flag('--verbose') || flag('-v')
66
+ const language = flagVal('--language')
67
+ const prompt = flagVal('--prompt')
68
+ const mimeOverride = flagVal('--mime')
69
+
70
+ const [modelId, file] = args
71
+ if (!modelId || !file) {
72
+ console.error('Usage: mo transcribe <model> <audio-file>')
73
+ process.exit(1)
74
+ }
75
+
76
+ const mimeType = mimeOverride || MIME_BY_EXT[extname(file).toLowerCase()]
77
+ if (!mimeType) {
78
+ console.error(`Unknown audio extension '${extname(file)}'. Pass --mime <type> (e.g. --mime audio/mpeg).`)
79
+ process.exit(1)
80
+ }
81
+
82
+ const log = verbose ? (...args) => process.stderr.write(`${args.map(a => typeof a === 'string' ? a : JSON.stringify(a)).join(' ')}\n`) : noop
83
+ const logger = {
84
+ ...silent,
85
+ debug: verbose ? log : noop,
86
+ info: log,
87
+ warn: log,
88
+ error: log,
89
+ fatal: log
90
+ }
91
+ const mo = await mohdel({ logger })
92
+
93
+ let model
94
+ try {
95
+ model = mo.use(modelId)
96
+ } catch (err) {
97
+ console.error(err.message)
98
+ for (const h of hintsForError(err, modelId)) console.error(h)
99
+ process.exit(1)
100
+ }
101
+
102
+ const options = {}
103
+ if (language) options.language = language
104
+ if (prompt) options.prompt = prompt
105
+
106
+ process.stderr.write(`${model.id}\n`)
107
+
108
+ try {
109
+ const result = await model.transcribe(
110
+ { fileUri: `file://${resolve(file)}`, mimeType },
111
+ options
112
+ )
113
+
114
+ if (json) {
115
+ console.log(JSON.stringify({
116
+ model: model.id,
117
+ text: result.text,
118
+ language: result.language,
119
+ durationSeconds: result.durationSeconds,
120
+ inputTokens: result.inputTokens,
121
+ outputTokens: result.outputTokens,
122
+ cost: result.cost ?? null,
123
+ status: result.status
124
+ }, null, 2))
125
+ } else {
126
+ process.stdout.write(result.text)
127
+ if (result.text && !result.text.endsWith('\n')) process.stdout.write('\n')
128
+ }
129
+
130
+ const summary = []
131
+ if (result.durationSeconds != null) summary.push(`${result.durationSeconds}s audio`)
132
+ if (result.inputTokens) summary.push(`${result.inputTokens} in`)
133
+ if (result.outputTokens) summary.push(`${result.outputTokens} out`)
134
+ if (result.cost != null) summary.push(`$${result.cost.toFixed(4)}`)
135
+ const ts = result.timestamps
136
+ if (ts?.start && ts?.end) {
137
+ summary.push(`${Math.round(Number(BigInt(ts.end) - BigInt(ts.start)) / 1e6)}ms total`)
138
+ }
139
+ if (summary.length) process.stderr.write(`${summary.join(', ')}\n`)
140
+ } catch (err) {
141
+ console.error(`Error: ${err.detail || err.message}`)
142
+ for (const h of hintsForError(err, modelId)) console.error(h)
143
+ process.exit(1)
144
+ }
145
+ }
package/src/lib/index.js CHANGED
@@ -14,7 +14,7 @@ import {
14
14
  } from './curated-cache.js'
15
15
  import { createRateLimiter } from '../../js/session/_rate_limiter.js'
16
16
  import { createCooldownTracker } from '../../js/session/_cooldown.js'
17
- import { runAnswer, runAnswerImage } from '../../js/factory/bridge.js'
17
+ import { runAnswer, runAnswerImage, runAnswerTranscription } from '../../js/factory/bridge.js'
18
18
  import { startSpan, endSpanOk, endSpanError } from './tracing.js'
19
19
  import { isValidTag } from './schema.js'
20
20
  import { silent } from './logger.js'
@@ -644,6 +644,20 @@ const createModelProxy = (resolvedModelId, modelSpec, handlers, aliasOutputEffor
644
644
  }
645
645
  }
646
646
 
647
+ if (prop === 'transcribe') {
648
+ return async (audio, options = {}) => {
649
+ const { configuration } = await getRuntime()
650
+ return runAnswerTranscription({
651
+ provider: modelSpec.provider,
652
+ model: modelSpec.model ?? resolvedModelId.split('/').pop(),
653
+ configuration,
654
+ audio,
655
+ options,
656
+ spec: modelSpec
657
+ })
658
+ }
659
+ }
660
+
647
661
  if (prop === 'setRateLimit') {
648
662
  return async ({ rpm, tpm } = {}) => {
649
663
  const curatedCache = getCuratedCacheSnapshot()
package/src/lib/schema.js CHANGED
@@ -27,6 +27,7 @@ const fieldDefs = {
27
27
  imagePrice: { type: 'number' },
28
28
  imageEndpoint: { type: 'string' },
29
29
  imageDefaultSize: { type: 'string' },
30
+ transcriptionPrice: { type: 'number' },
30
31
  deprecated: { type: 'string' },
31
32
  suspended: { type: 'string' },
32
33
  rpmLimit: { type: 'number' },