mohdel 0.110.0 → 0.111.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +57 -19
- package/config/curated.example.json +12 -0
- package/config/curated.schema.json +4 -3
- package/js/core/transcription.js +70 -0
- package/js/factory/bridge.js +34 -0
- package/js/session/adapters/_pricing.js +32 -0
- package/js/session/adapters/transcription/fake.js +53 -0
- package/js/session/adapters/transcription/index.js +58 -0
- package/js/session/adapters/transcription/openai_compatible.js +177 -0
- package/js/session/run_transcription.js +64 -0
- package/package.json +4 -4
- package/src/cli/ask.js +2 -2
- package/src/cli/index.js +4 -0
- package/src/cli/model.js +1 -1
- package/src/cli/transcribe.js +145 -0
- package/src/lib/index.js +15 -1
- package/src/lib/schema.js +1 -0
package/README.md
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
# Mohdel
|
|
2
2
|
|
|
3
|
-
|
|
3
|
+
Self-hosted LLM gateway and SDK for Node — think LiteLLM, for the JS world. One `answer()` call for 11 providers; swap models by changing one string; get real per-call USD cost back on every result, with OpenTelemetry built in and process isolation when you need it. Your keys, your infra, no SaaS proxy in the path.
|
|
4
4
|
|
|
5
5
|
```bash
|
|
6
6
|
npm install -g mohdel
|
|
@@ -12,15 +12,49 @@ Providers: Anthropic, OpenAI, Gemini, Mistral, Groq, xAI, Cerebras, Fireworks, D
|
|
|
12
12
|
|
|
13
13
|
## Why mohdel
|
|
14
14
|
|
|
15
|
+
- **Real numbers on every call.** Token counts and per-call USD cost computed from your own pricing catalog (`curated.json`) — not estimates, not provider-specific shapes. Bill tenants, alert on spend, reconcile invoices. See [docs/CATALOG.md](docs/CATALOG.md) for the catalog format.
|
|
15
16
|
- **One interface across providers.** Same `answer()` call, same event stream, same `{ status, output, inputTokens, outputTokens, cost }` result. Switching from `anthropic/claude-sonnet-4-6` to `openai/gpt-5.4-mini` is one string change — adapter differences stay inside mohdel.
|
|
16
|
-
- **
|
|
17
|
+
- **Self-hosted, no vendor in the path.** API keys live in `~/.config/mohdel/`. Mohdel calls provider APIs directly; nothing routes through a third party, nothing marks up your tokens, no extra hop of availability risk.
|
|
17
18
|
- **Observability without instrumentation.** OpenTelemetry spans, trace-linked logs, and OTLP metrics over one endpoint. Set `OTEL_EXPORTER_OTLP_ENDPOINT`; everything else is wired.
|
|
18
19
|
- **Two integration paths, same API.** In-process factory for CLI tools, scripts, single-process services. Optional `thin-gate` subprocess for fault isolation, cross-process quota, and any-language HTTP callers — no code change to switch.
|
|
19
|
-
|
|
20
|
+
|
|
21
|
+
## How it compares
|
|
22
|
+
|
|
23
|
+
The one-paragraph version: **LiteLLM** is the closest analog but lives in
|
|
24
|
+
Python; **Vercel AI SDK** is an application toolkit, not an infra layer;
|
|
25
|
+
**OpenRouter** is the same one-API promise as a SaaS in your request path;
|
|
26
|
+
**raw provider SDKs** are N different shapes with no cost accounting.
|
|
27
|
+
|
|
28
|
+
| | mohdel | LiteLLM | Vercel AI SDK | OpenRouter | Raw SDKs |
|
|
29
|
+
|---|---|---|---|---|---|
|
|
30
|
+
| Runs in a Node stack natively | yes | Python service | yes | n/a (SaaS) | yes |
|
|
31
|
+
| Per-call USD cost on the result | yes | yes | no | yes | no |
|
|
32
|
+
| Self-hosted, keys never leave your infra | yes | yes | yes | no | yes |
|
|
33
|
+
| Provider-SDK process isolation | yes (thin-gate) | proxy only | no | n/a | no |
|
|
34
|
+
| OTel spans + metrics out of the box | yes | via callbacks | no | no | no |
|
|
35
|
+
| UI streaming helpers, structured output, agents | no — by design | no | yes | no | varies |
|
|
36
|
+
|
|
37
|
+
- **vs LiteLLM** — same core promise (unified calls, cost tracking,
|
|
38
|
+
self-hosted gateway), but Node-native: if your stack is JS, there's no
|
|
39
|
+
Python sidecar to deploy, version, and monitor. The honest gap: LiteLLM's
|
|
40
|
+
proxy exposes an OpenAI-compatible endpoint and admin features (virtual
|
|
41
|
+
keys, budgets); thin-gate speaks its own [wire protocol](PROTOCOL.md) —
|
|
42
|
+
callers use the JS client or implement the protocol.
|
|
43
|
+
- **vs Vercel AI SDK** — different layer, not a rival. The AI SDK is an
|
|
44
|
+
application toolkit (UI streaming, structured outputs, agent loops) with no
|
|
45
|
+
per-call cost, no gateway, no process isolation. Use it *above* mohdel if
|
|
46
|
+
you like it — mohdel is the inference primitive underneath.
|
|
47
|
+
- **vs OpenRouter** — the self-hosted version of the same idea. With a SaaS
|
|
48
|
+
router you accept their uptime, their markup, and your prompts transiting
|
|
49
|
+
their infra. Mohdel goes direct to providers with your keys — and ships an
|
|
50
|
+
`openrouter` adapter for when you want both.
|
|
51
|
+
- **vs raw provider SDKs** — no abstraction tax to escape later: mohdel's
|
|
52
|
+
envelope is flat and close to the SDKs underneath, and `cost`/`tokens`
|
|
53
|
+
come back normalized so you never parse five different usage shapes.
|
|
20
54
|
|
|
21
55
|
## Documentation
|
|
22
56
|
|
|
23
|
-
- [INTEGRATION.md](INTEGRATION.md) — JS library guide (factory, client, answer options, tools, streaming, vision, errors, OTel)
|
|
57
|
+
- [INTEGRATION.md](INTEGRATION.md) — JS library guide (factory, client, answer options, tools, streaming, vision, transcription, errors, OTel)
|
|
24
58
|
- [docs/COOKBOOK.md](docs/COOKBOOK.md) — copy-paste recipes (summarize a file, stream, swap providers, tools, vision, batch + cost)
|
|
25
59
|
- [docs/CATALOG.md](docs/CATALOG.md) — `curated.json` walkthrough with worked examples
|
|
26
60
|
- [docs/GLOSSARY.md](docs/GLOSSARY.md) — short definitions for envelope, thin-gate, session, creator vs provider, status, …
|
|
@@ -67,6 +101,10 @@ mo ask anthropic/claude-sonnet-4-6 --stream "write a haiku about recursion"
|
|
|
67
101
|
# With thinking effort
|
|
68
102
|
mo ask anthropic/claude-opus-4-6 --effort high "prove P != NP"
|
|
69
103
|
|
|
104
|
+
# Speech → text from an audio file
|
|
105
|
+
mo transcribe groq/whisper-large-v3-turbo meeting.mp3
|
|
106
|
+
mo transcribe mistral/voxtral-mini-transcribe interview.wav --language fr
|
|
107
|
+
|
|
70
108
|
# Browse the model catalog
|
|
71
109
|
mo ls # list all curated models
|
|
72
110
|
mo ls --sort price # sorted by input price
|
|
@@ -101,9 +139,21 @@ All list/show commands support `--json [fields]` — bare `--json` lists availab
|
|
|
101
139
|
|
|
102
140
|
## Library Usage
|
|
103
141
|
|
|
104
|
-
Two integration paths: the **
|
|
142
|
+
Two integration paths, same adapters underneath: start with the in-process **factory**; graduate to the cross-process **client** when you want gateway-grade isolation.
|
|
143
|
+
|
|
144
|
+
### Factory — in-process (start here)
|
|
145
|
+
|
|
146
|
+
```js
|
|
147
|
+
import mohdel from 'mohdel'
|
|
148
|
+
|
|
149
|
+
const mo = await mohdel()
|
|
150
|
+
const result = await mo.use('anthropic/claude-sonnet-4-6').answer('Hello')
|
|
151
|
+
console.log(result.output, result.cost)
|
|
152
|
+
```
|
|
153
|
+
|
|
154
|
+
No subprocess, no setup beyond your API key. Right for CLI tools (`mo ask`), scripts, tests, and single-process services — which is most projects.
|
|
105
155
|
|
|
106
|
-
### Client — cross-process (
|
|
156
|
+
### Client — cross-process (the production gateway)
|
|
107
157
|
|
|
108
158
|
```js
|
|
109
159
|
import { call } from 'mohdel/client'
|
|
@@ -119,19 +169,7 @@ for await (const ev of call(envelope, { socketPath: '/tmp/mohdel-data.sock' }))
|
|
|
119
169
|
}
|
|
120
170
|
```
|
|
121
171
|
|
|
122
|
-
|
|
123
|
-
|
|
124
|
-
### Factory — in-process shortcut
|
|
125
|
-
|
|
126
|
-
```js
|
|
127
|
-
import mohdel from 'mohdel'
|
|
128
|
-
|
|
129
|
-
const mo = await mohdel()
|
|
130
|
-
const result = await mo.use('anthropic/claude-sonnet-4-6').answer('Hello')
|
|
131
|
-
console.log(result.output, result.cost)
|
|
132
|
-
```
|
|
133
|
-
|
|
134
|
-
No subprocess; the factory runs the same session adapters inline. Right for CLI (`mo ask`), scripts, tests, single-process services.
|
|
172
|
+
Same API, but inference runs in a pooled subprocess behind the `thin-gate` supervisor (Rust): a crashing provider SDK can't take your service down, quota is enforced across processes, and non-JS callers can speak the same wire. Switching from factory to client is a configuration change, not a rewrite. See [INTEGRATION.md §Client](INTEGRATION.md#client-cross-process--primary-production-integration) for setup.
|
|
135
173
|
|
|
136
174
|
For the full API — initialization, alias resolution, answer options, response shape, tool use, streaming, vision, error handling, OpenTelemetry, sub-path exports — see **[INTEGRATION.md](INTEGRATION.md)**.
|
|
137
175
|
|
|
@@ -61,6 +61,18 @@
|
|
|
61
61
|
"tags": ["image"]
|
|
62
62
|
},
|
|
63
63
|
|
|
64
|
+
"groq/whisper-large-v3-turbo": {
|
|
65
|
+
"_comment_f": "Transcription entry. type:'transcription' selects the speech-to-text dispatcher (multipart upload, no streaming). transcriptionPrice is per audio MINUTE — Groq's $0.04/hour ≈ $0.000667/min. Token-billed models (openai/gpt-4o-mini-transcribe) use inputPrice/outputPrice instead.",
|
|
66
|
+
"model": "whisper-large-v3-turbo",
|
|
67
|
+
"creator": "openai",
|
|
68
|
+
"provider": "groq",
|
|
69
|
+
"label": "Whisper Large v3 Turbo",
|
|
70
|
+
"inputFormat": ["audio"],
|
|
71
|
+
"type": "transcription",
|
|
72
|
+
"transcriptionPrice": 0.000667,
|
|
73
|
+
"tags": ["transcription", "fast"]
|
|
74
|
+
},
|
|
75
|
+
|
|
64
76
|
"openai/gpt-5.4-mini": {
|
|
65
77
|
"_comment_e": "Entry with custom rate limits. rpmLimit and tpmLimit override the provider-level defaults in providers.json. rateLimitScope:'model' means the limit is per-model; 'provider' means it joins the provider-level pool.",
|
|
66
78
|
"model": "gpt-5.4-mini",
|
|
@@ -45,7 +45,7 @@
|
|
|
45
45
|
"inputFormat": {
|
|
46
46
|
"type": "array",
|
|
47
47
|
"description": "Accepted input modalities. Defaults to ['text'].",
|
|
48
|
-
"items": { "type": "string", "enum": ["text", "image", "video"] },
|
|
48
|
+
"items": { "type": "string", "enum": ["text", "image", "video", "audio"] },
|
|
49
49
|
"minItems": 1,
|
|
50
50
|
"uniqueItems": true
|
|
51
51
|
},
|
|
@@ -56,9 +56,9 @@
|
|
|
56
56
|
"sdk": { "type": "string", "description": "SDK adapter to use (some providers require this)." },
|
|
57
57
|
"type": {
|
|
58
58
|
"type": "string",
|
|
59
|
-
"enum": ["model", "image"],
|
|
59
|
+
"enum": ["model", "image", "transcription"],
|
|
60
60
|
"default": "model",
|
|
61
|
-
"description": "'model' for chat/completion, 'image' for image generation."
|
|
61
|
+
"description": "'model' for chat/completion, 'image' for image generation, 'transcription' for speech-to-text."
|
|
62
62
|
},
|
|
63
63
|
"label": { "type": "string", "description": "Human-readable name shown in UIs." },
|
|
64
64
|
"description": { "type": "string" },
|
|
@@ -137,6 +137,7 @@
|
|
|
137
137
|
"imagePrice": { "type": "number", "minimum": 0, "description": "USD per generated image (image-type entries)." },
|
|
138
138
|
"imageEndpoint": { "type": "string", "description": "Provider-side image endpoint name." },
|
|
139
139
|
"imageDefaultSize": { "type": "string", "description": "Default size when envelope omits one (e.g. '1024x1024')." },
|
|
140
|
+
"transcriptionPrice": { "type": "number", "minimum": 0, "description": "USD per audio minute (transcription-type entries). Token-billed transcription models (OpenAI gpt-4o-*-transcribe) use inputPrice/outputPrice instead." },
|
|
140
141
|
|
|
141
142
|
"rpmLimit": { "type": "integer", "minimum": 1, "description": "Requests per minute. Overrides provider default." },
|
|
142
143
|
"tpmLimit": { "type": "integer", "minimum": 1, "description": "Tokens per minute. Overrides provider default." },
|
|
@@ -0,0 +1,70 @@
|
|
|
1
|
+
/**
|
|
2
|
+
* Transcription (voice → text) envelope and result.
|
|
3
|
+
*
|
|
4
|
+
* Separate call path from `CallEnvelope` / `AnswerResult`: transcription
|
|
5
|
+
* is a single synchronous request/response (no streaming) against a
|
|
6
|
+
* provider's `/audio/transcriptions` endpoint.
|
|
7
|
+
* Result shape: `{ status, text, language, durationSeconds, cost, timestamps }`.
|
|
8
|
+
*
|
|
9
|
+
* @module core/transcription
|
|
10
|
+
*/
|
|
11
|
+
|
|
12
|
+
/**
|
|
13
|
+
* @typedef {object} AudioRef
|
|
14
|
+
* @property {string} fileUri
|
|
15
|
+
* `file://` or `data:` URI. Remote `https://` audio is not
|
|
16
|
+
* supported — providers require multipart upload, so the caller
|
|
17
|
+
* owns the download.
|
|
18
|
+
* @property {string} mimeType e.g. "audio/mpeg", "audio/wav".
|
|
19
|
+
*/
|
|
20
|
+
|
|
21
|
+
/**
|
|
22
|
+
* @typedef {object} TranscriptionEnvelope
|
|
23
|
+
*
|
|
24
|
+
* @property {string} callId
|
|
25
|
+
* @property {string} authId
|
|
26
|
+
* @property {import('./envelope.js').Auth} auth
|
|
27
|
+
* @property {string} [traceparent]
|
|
28
|
+
* @property {string} [baggage]
|
|
29
|
+
*
|
|
30
|
+
* @property {import('./model-id.js').ModelId} model
|
|
31
|
+
* Full mohdel id — `"<provider>/<bare>"`. Same shape as
|
|
32
|
+
* `CallEnvelope.model` (see `envelope.js`).
|
|
33
|
+
* @property {AudioRef} audio
|
|
34
|
+
*
|
|
35
|
+
* @property {string} [language] ISO-639-1 hint (e.g. "en", "fr").
|
|
36
|
+
* @property {string} [prompt] Spelling/context hint forwarded to the provider.
|
|
37
|
+
*/
|
|
38
|
+
|
|
39
|
+
/**
|
|
40
|
+
* @typedef {object} TranscriptionResult
|
|
41
|
+
*
|
|
42
|
+
* @property {'completed'} status
|
|
43
|
+
* Transcriptions are one-shot — no `incomplete` state.
|
|
44
|
+
* @property {string} text
|
|
45
|
+
* @property {string | null} language
|
|
46
|
+
* Detected (or echoed) language when the provider reports one.
|
|
47
|
+
* @property {number | null} durationSeconds
|
|
48
|
+
* Audio duration as reported by the provider; null when not reported.
|
|
49
|
+
* @property {number} [inputTokens]
|
|
50
|
+
* Present only for token-billed providers (OpenAI gpt-4o-*-transcribe).
|
|
51
|
+
* @property {number} [outputTokens]
|
|
52
|
+
* @property {number} cost
|
|
53
|
+
* USD. `transcriptionPrice` (per audio minute) × duration when the
|
|
54
|
+
* provider reports duration; token pricing fallback otherwise; 0 when
|
|
55
|
+
* the spec carries no usable price.
|
|
56
|
+
* @property {{start: string, first: string, end: string}} timestamps
|
|
57
|
+
* hrtime-bigint-as-string. `first` = `end` (no streaming).
|
|
58
|
+
*/
|
|
59
|
+
|
|
60
|
+
export const TRANSCRIPTION_ENVELOPE_FIELDS = Object.freeze([
|
|
61
|
+
'callId',
|
|
62
|
+
'authId',
|
|
63
|
+
'auth',
|
|
64
|
+
'traceparent',
|
|
65
|
+
'baggage',
|
|
66
|
+
'model',
|
|
67
|
+
'audio',
|
|
68
|
+
'language',
|
|
69
|
+
'prompt'
|
|
70
|
+
])
|
package/js/factory/bridge.js
CHANGED
|
@@ -22,6 +22,7 @@
|
|
|
22
22
|
|
|
23
23
|
import { run } from '../session/run.js'
|
|
24
24
|
import { runImage } from '../session/run_image.js'
|
|
25
|
+
import { runTranscription } from '../session/run_transcription.js'
|
|
25
26
|
import { MohdelError, Severity } from '../../src/lib/errors.js'
|
|
26
27
|
import { createRealtimeDeltaBuffer } from '../../src/lib/utils.js'
|
|
27
28
|
|
|
@@ -149,6 +150,39 @@ export async function runAnswerImage ({ provider, model, configuration, prompt,
|
|
|
149
150
|
return out.result
|
|
150
151
|
}
|
|
151
152
|
|
|
153
|
+
/**
|
|
154
|
+
* Run a `transcribe()` call through the /session runtime.
|
|
155
|
+
*
|
|
156
|
+
* @param {object} args
|
|
157
|
+
* @param {string} args.provider
|
|
158
|
+
* @param {string} args.model
|
|
159
|
+
* @param {any} args.configuration
|
|
160
|
+
* @param {{fileUri: string, mimeType: string}} args.audio
|
|
161
|
+
* @param {any} [args.options] `language` / `prompt` map onto the
|
|
162
|
+
* envelope; `callId` / `authId` are
|
|
163
|
+
* transport metadata.
|
|
164
|
+
* @param {any} [args.spec] modelSpec passthrough so the adapter
|
|
165
|
+
* picks up `model` and
|
|
166
|
+
* `transcriptionPrice` without
|
|
167
|
+
* re-reading the catalog.
|
|
168
|
+
* @returns {Promise<any>}
|
|
169
|
+
*/
|
|
170
|
+
export async function runAnswerTranscription ({ provider, model, configuration, audio, options = {}, spec }) {
|
|
171
|
+
const envelope = {
|
|
172
|
+
callId: options.callId || newCallId(),
|
|
173
|
+
authId: options.authId || 'local',
|
|
174
|
+
auth: configToAuth(configuration),
|
|
175
|
+
model: `${provider}/${model}`,
|
|
176
|
+
audio
|
|
177
|
+
}
|
|
178
|
+
if (options.language) envelope.language = options.language
|
|
179
|
+
if (options.prompt) envelope.prompt = options.prompt
|
|
180
|
+
|
|
181
|
+
const out = await runTranscription(envelope, spec ? { spec } : {})
|
|
182
|
+
if (!out.ok) throw fromTypedError(out.error, { provider, model })
|
|
183
|
+
return out.result
|
|
184
|
+
}
|
|
185
|
+
|
|
152
186
|
/**
|
|
153
187
|
* @param {object} args
|
|
154
188
|
* @param {string} args.modelKey Mohdel catalog key `<provider>/<bare>`. The
|
|
@@ -100,6 +100,38 @@ export function costFor (model, usage) {
|
|
|
100
100
|
return computeCost(getSpec(model), usage)
|
|
101
101
|
}
|
|
102
102
|
|
|
103
|
+
/**
|
|
104
|
+
* Cost of a transcription call.
|
|
105
|
+
*
|
|
106
|
+
* Providers bill speech-to-text two ways, and the catalog supports both:
|
|
107
|
+
*
|
|
108
|
+
* - `transcriptionPrice` — flat USD per audio **minute** (Groq,
|
|
109
|
+
* Mistral; the industry quoting unit). Used when the provider
|
|
110
|
+
* reported the audio duration.
|
|
111
|
+
* - token pricing (`inputPrice`/`outputPrice`) — OpenAI's
|
|
112
|
+
* gpt-4o-*-transcribe models report token usage instead of
|
|
113
|
+
* duration; falls through to `computeCost`.
|
|
114
|
+
*
|
|
115
|
+
* Duration wins when both are available. Unknown models or specs
|
|
116
|
+
* without prices return `0` — same graceful degradation as
|
|
117
|
+
* `computeCost`.
|
|
118
|
+
*
|
|
119
|
+
* @param {any} spec Catalog entry, or `undefined`.
|
|
120
|
+
* @param {{durationSeconds?: number | null, inputTokens?: number, outputTokens?: number}} usage
|
|
121
|
+
* @returns {number}
|
|
122
|
+
*/
|
|
123
|
+
export function computeTranscriptionCost (spec, usage) {
|
|
124
|
+
if (!spec) return 0
|
|
125
|
+
const seconds = usage.durationSeconds
|
|
126
|
+
if (typeof seconds === 'number' && seconds > 0 && typeof spec.transcriptionPrice === 'number') {
|
|
127
|
+
return round((seconds / 60) * spec.transcriptionPrice)
|
|
128
|
+
}
|
|
129
|
+
if (usage.inputTokens || usage.outputTokens) {
|
|
130
|
+
return computeCost(spec, { inputTokens: usage.inputTokens, outputTokens: usage.outputTokens })
|
|
131
|
+
}
|
|
132
|
+
return 0
|
|
133
|
+
}
|
|
134
|
+
|
|
103
135
|
/**
|
|
104
136
|
* Test convenience: inject pricing-only specs by model id. Wraps
|
|
105
137
|
* `setCatalog` with the `{input, output, thinking?}` shape used in
|
|
@@ -0,0 +1,53 @@
|
|
|
1
|
+
/**
|
|
2
|
+
* Fake transcription adapter — scenario-driven for tests and bug
|
|
3
|
+
* reproductions. Never calls a real API.
|
|
4
|
+
*
|
|
5
|
+
* Mirrors the `fake` image adapter shape: the envelope's `prompt`
|
|
6
|
+
* field carries a JSON scenario spec; the `mode` key picks a
|
|
7
|
+
* behavior. Missing / non-JSON prompts fall through to `mode: "ok"`.
|
|
8
|
+
*
|
|
9
|
+
* ## Modes
|
|
10
|
+
*
|
|
11
|
+
* | mode | params | behavior |
|
|
12
|
+
* |---------|---------------------------------|--------------------------------|
|
|
13
|
+
* | `ok` | `text?`, `durationSeconds?` | returns a canned transcription |
|
|
14
|
+
* | `error` | `type`, `message` | throws a tagged error |
|
|
15
|
+
*
|
|
16
|
+
* @module session/adapters/transcription/fake
|
|
17
|
+
*/
|
|
18
|
+
|
|
19
|
+
/**
|
|
20
|
+
* @param {import('#core/transcription.js').TranscriptionEnvelope} envelope
|
|
21
|
+
* @returns {Promise<import('#core/transcription.js').TranscriptionResult>}
|
|
22
|
+
*/
|
|
23
|
+
export async function fakeTranscription (envelope) {
|
|
24
|
+
const scenario = parseScenario(envelope.prompt)
|
|
25
|
+
const mode = scenario.mode ?? 'ok'
|
|
26
|
+
|
|
27
|
+
if (mode === 'error') {
|
|
28
|
+
const err = new Error(scenario.message || 'fake transcription error')
|
|
29
|
+
err.typed = {
|
|
30
|
+
message: scenario.message || 'fake transcription error',
|
|
31
|
+
severity: 'error',
|
|
32
|
+
retryable: !!scenario.retryable,
|
|
33
|
+
type: scenario.type || 'PROVIDER_ERROR'
|
|
34
|
+
}
|
|
35
|
+
throw err
|
|
36
|
+
}
|
|
37
|
+
|
|
38
|
+
const now = `${process.hrtime.bigint()}`
|
|
39
|
+
return {
|
|
40
|
+
status: 'completed',
|
|
41
|
+
text: scenario.text ?? `fake transcript for ${envelope.callId}`,
|
|
42
|
+
language: scenario.language ?? 'en',
|
|
43
|
+
durationSeconds: scenario.durationSeconds ?? 1,
|
|
44
|
+
cost: 0,
|
|
45
|
+
timestamps: { start: now, first: now, end: now }
|
|
46
|
+
}
|
|
47
|
+
}
|
|
48
|
+
|
|
49
|
+
/** @param {unknown} prompt */
|
|
50
|
+
function parseScenario (prompt) {
|
|
51
|
+
if (typeof prompt !== 'string') return {}
|
|
52
|
+
try { return JSON.parse(prompt) || {} } catch { return {} }
|
|
53
|
+
}
|
|
@@ -0,0 +1,58 @@
|
|
|
1
|
+
/**
|
|
2
|
+
* Transcription-adapter registry. Mirrors session/adapters/image but
|
|
3
|
+
* scoped to speech-to-text providers.
|
|
4
|
+
*
|
|
5
|
+
* Groq, OpenAI, and Mistral all expose the same OpenAI-compatible
|
|
6
|
+
* `POST /audio/transcriptions` multipart endpoint, so each entry is
|
|
7
|
+
* the shared adapter bound to per-provider knobs:
|
|
8
|
+
*
|
|
9
|
+
* - `baseURL` — the provider's OpenAI-compatible API root.
|
|
10
|
+
* - `responseFormat` — `verbose_json` where supported (returns
|
|
11
|
+
* `duration`, needed for per-minute pricing). OpenAI's
|
|
12
|
+
* gpt-4o-*-transcribe models reject `verbose_json` (plain `json`
|
|
13
|
+
* returns token usage instead); Mistral rejects the field
|
|
14
|
+
* entirely (its default response already carries
|
|
15
|
+
* `usage.prompt_audio_seconds`).
|
|
16
|
+
*
|
|
17
|
+
* @module session/adapters/transcription
|
|
18
|
+
*/
|
|
19
|
+
|
|
20
|
+
import { createTranscriptionAdapter } from './openai_compatible.js'
|
|
21
|
+
import { fakeTranscription } from './fake.js'
|
|
22
|
+
|
|
23
|
+
const TRANSCRIPTION_ADAPTERS = {
|
|
24
|
+
groq: createTranscriptionAdapter({
|
|
25
|
+
baseURL: 'https://api.groq.com/openai/v1',
|
|
26
|
+
responseFormat: 'verbose_json'
|
|
27
|
+
}),
|
|
28
|
+
openai: createTranscriptionAdapter({
|
|
29
|
+
baseURL: 'https://api.openai.com/v1',
|
|
30
|
+
responseFormat: 'json'
|
|
31
|
+
}),
|
|
32
|
+
mistral: createTranscriptionAdapter({
|
|
33
|
+
baseURL: 'https://api.mistral.ai/v1'
|
|
34
|
+
}),
|
|
35
|
+
fake: fakeTranscription
|
|
36
|
+
}
|
|
37
|
+
|
|
38
|
+
/**
|
|
39
|
+
* @param {string} provider
|
|
40
|
+
* @returns {(
|
|
41
|
+
* env: import('#core/transcription.js').TranscriptionEnvelope,
|
|
42
|
+
* deps?: any
|
|
43
|
+
* ) => Promise<import('#core/transcription.js').TranscriptionResult>}
|
|
44
|
+
*/
|
|
45
|
+
export function getTranscriptionAdapter (provider) {
|
|
46
|
+
const adapter = TRANSCRIPTION_ADAPTERS[provider]
|
|
47
|
+
if (!adapter) throw new Error(`no transcription adapter for provider: ${provider}`)
|
|
48
|
+
return adapter
|
|
49
|
+
}
|
|
50
|
+
|
|
51
|
+
/**
|
|
52
|
+
* Whether the provider has a transcription adapter registered.
|
|
53
|
+
*
|
|
54
|
+
* @param {string} provider
|
|
55
|
+
*/
|
|
56
|
+
export function isTranscriptionProvider (provider) {
|
|
57
|
+
return Object.prototype.hasOwnProperty.call(TRANSCRIPTION_ADAPTERS, provider)
|
|
58
|
+
}
|
|
@@ -0,0 +1,177 @@
|
|
|
1
|
+
/**
|
|
2
|
+
* Shared transcription adapter for OpenAI-compatible
|
|
3
|
+
* `POST <baseURL>/audio/transcriptions` endpoints (multipart upload).
|
|
4
|
+
*
|
|
5
|
+
* Groq, Mistral, and OpenAI all implement the same endpoint shape;
|
|
6
|
+
* only the base URL and the supported `response_format` differ, so
|
|
7
|
+
* one adapter covers all three. Per-provider knobs are bound via
|
|
8
|
+
* `createTranscriptionAdapter` in `./index.js`.
|
|
9
|
+
*
|
|
10
|
+
* Duration extraction (for per-minute pricing) is response-shape
|
|
11
|
+
* dependent:
|
|
12
|
+
* - `body.duration` — whisper `verbose_json` (Groq)
|
|
13
|
+
* - `body.usage.seconds` — OpenAI duration-type usage
|
|
14
|
+
* - `body.usage.prompt_audio_seconds` — Mistral Voxtral
|
|
15
|
+
* OpenAI's gpt-4o-*-transcribe models report token usage instead;
|
|
16
|
+
* `computeTranscriptionCost` falls back to token pricing for those.
|
|
17
|
+
*
|
|
18
|
+
* @module session/adapters/transcription/openai_compatible
|
|
19
|
+
*/
|
|
20
|
+
|
|
21
|
+
import { readFile } from 'node:fs/promises'
|
|
22
|
+
import { basename } from 'node:path'
|
|
23
|
+
|
|
24
|
+
import { getSpec } from '../_catalog.js'
|
|
25
|
+
import { classifyProviderError } from '../_errors.js'
|
|
26
|
+
import { computeTranscriptionCost } from '../_pricing.js'
|
|
27
|
+
import { catalogKey, bareOf } from '#core/model-id.js'
|
|
28
|
+
|
|
29
|
+
/**
|
|
30
|
+
* @param {{baseURL: string, responseFormat?: string}} config
|
|
31
|
+
* @returns {(
|
|
32
|
+
* env: import('#core/transcription.js').TranscriptionEnvelope,
|
|
33
|
+
* deps?: {fetch?: typeof fetch, spec?: any}
|
|
34
|
+
* ) => Promise<import('#core/transcription.js').TranscriptionResult>}
|
|
35
|
+
*/
|
|
36
|
+
export function createTranscriptionAdapter ({ baseURL, responseFormat }) {
|
|
37
|
+
return async function transcription (envelope, deps = {}) {
|
|
38
|
+
const fetchFn = deps.fetch ?? globalThis.fetch
|
|
39
|
+
const spec = deps.spec ?? getSpec(catalogKey(envelope.model)) ?? {}
|
|
40
|
+
const start = String(process.hrtime.bigint())
|
|
41
|
+
|
|
42
|
+
const audio = await loadAudio(envelope.audio)
|
|
43
|
+
|
|
44
|
+
const form = new FormData()
|
|
45
|
+
form.append('model', spec.model ?? bareOf(envelope.model))
|
|
46
|
+
form.append('file', new Blob([audio.bytes], { type: audio.mimeType }), audio.filename)
|
|
47
|
+
if (responseFormat) form.append('response_format', responseFormat)
|
|
48
|
+
if (envelope.language) form.append('language', envelope.language)
|
|
49
|
+
if (envelope.prompt) form.append('prompt', envelope.prompt)
|
|
50
|
+
|
|
51
|
+
const root = (envelope.auth.baseURL || baseURL).replace(/\/$/, '')
|
|
52
|
+
let res
|
|
53
|
+
try {
|
|
54
|
+
res = await fetchFn(`${root}/audio/transcriptions`, {
|
|
55
|
+
method: 'POST',
|
|
56
|
+
headers: { Authorization: `Bearer ${envelope.auth.key}` },
|
|
57
|
+
body: form
|
|
58
|
+
})
|
|
59
|
+
} catch (e) {
|
|
60
|
+
throw typedError(classifyProviderError(e, envelope.auth?.key).message, 'NET_ERROR', true)
|
|
61
|
+
}
|
|
62
|
+
if (!res.ok) {
|
|
63
|
+
const text = await res.text().catch(() => '')
|
|
64
|
+
throw fromHttpStatus(res.status, 'transcription request failed', text.slice(0, 200))
|
|
65
|
+
}
|
|
66
|
+
|
|
67
|
+
const body = await res.json()
|
|
68
|
+
const durationSeconds = extractDuration(body)
|
|
69
|
+
const tokens = extractTokens(body)
|
|
70
|
+
const cost = computeTranscriptionCost(spec, { durationSeconds, ...tokens })
|
|
71
|
+
|
|
72
|
+
const end = String(process.hrtime.bigint())
|
|
73
|
+
return {
|
|
74
|
+
status: 'completed',
|
|
75
|
+
text: typeof body.text === 'string' ? body.text : '',
|
|
76
|
+
language: typeof body.language === 'string' ? body.language : null,
|
|
77
|
+
durationSeconds,
|
|
78
|
+
...tokens,
|
|
79
|
+
cost,
|
|
80
|
+
timestamps: { start, first: end, end }
|
|
81
|
+
}
|
|
82
|
+
}
|
|
83
|
+
}
|
|
84
|
+
|
|
85
|
+
/** @param {any} body */
|
|
86
|
+
function extractDuration (body) {
|
|
87
|
+
if (typeof body.duration === 'number') return body.duration
|
|
88
|
+
const u = body.usage
|
|
89
|
+
if (u && typeof u === 'object') {
|
|
90
|
+
if (typeof u.seconds === 'number') return u.seconds
|
|
91
|
+
if (typeof u.prompt_audio_seconds === 'number') return u.prompt_audio_seconds
|
|
92
|
+
}
|
|
93
|
+
return null
|
|
94
|
+
}
|
|
95
|
+
|
|
96
|
+
/** @param {any} body */
|
|
97
|
+
function extractTokens (body) {
|
|
98
|
+
const u = body.usage
|
|
99
|
+
if (!u || typeof u !== 'object') return {}
|
|
100
|
+
const out = {}
|
|
101
|
+
if (typeof u.input_tokens === 'number') out.inputTokens = u.input_tokens
|
|
102
|
+
if (typeof u.output_tokens === 'number') out.outputTokens = u.output_tokens
|
|
103
|
+
return out
|
|
104
|
+
}
|
|
105
|
+
|
|
106
|
+
// Multipart filename drives format sniffing on the provider side, so
|
|
107
|
+
// data: URIs need an extension synthesized from the MIME subtype.
|
|
108
|
+
const EXT_BY_MIME = {
|
|
109
|
+
'audio/mpeg': 'mp3',
|
|
110
|
+
'audio/mp4': 'm4a',
|
|
111
|
+
'audio/x-m4a': 'm4a',
|
|
112
|
+
'audio/wav': 'wav',
|
|
113
|
+
'audio/x-wav': 'wav',
|
|
114
|
+
'audio/webm': 'webm',
|
|
115
|
+
'audio/flac': 'flac',
|
|
116
|
+
'audio/x-flac': 'flac',
|
|
117
|
+
'audio/ogg': 'ogg',
|
|
118
|
+
'audio/opus': 'opus'
|
|
119
|
+
}
|
|
120
|
+
|
|
121
|
+
/**
|
|
122
|
+
* `file://` and `data:` URIs only — providers require multipart
|
|
123
|
+
* upload, so remote `https://` audio would mean mohdel silently
|
|
124
|
+
* downloading arbitrary URLs; the caller owns that step.
|
|
125
|
+
*
|
|
126
|
+
* @param {import('#core/transcription.js').AudioRef} audio
|
|
127
|
+
* @returns {Promise<{bytes: Buffer, mimeType: string, filename: string}>}
|
|
128
|
+
*/
|
|
129
|
+
export async function loadAudio (audio) {
|
|
130
|
+
if (!audio?.fileUri || !audio?.mimeType) {
|
|
131
|
+
throw typedError('transcription requires audio {fileUri, mimeType}', 'SESSION_INVALID_AUDIO', false)
|
|
132
|
+
}
|
|
133
|
+
const { fileUri, mimeType } = audio
|
|
134
|
+
if (fileUri.startsWith('file://')) {
|
|
135
|
+
const path = fileUri.replace(/^file:\/\//, '')
|
|
136
|
+
let bytes
|
|
137
|
+
try {
|
|
138
|
+
bytes = await readFile(path)
|
|
139
|
+
} catch (e) {
|
|
140
|
+
throw typedError('audio file unreadable', 'SESSION_INVALID_AUDIO', false, messageOf(e))
|
|
141
|
+
}
|
|
142
|
+
return { bytes, mimeType, filename: basename(path) }
|
|
143
|
+
}
|
|
144
|
+
if (fileUri.startsWith('data:')) {
|
|
145
|
+
const parts = fileUri.split(',')
|
|
146
|
+
if (parts.length < 2) {
|
|
147
|
+
throw typedError('malformed audio data URI', 'SESSION_INVALID_AUDIO', false)
|
|
148
|
+
}
|
|
149
|
+
const ext = EXT_BY_MIME[mimeType] || mimeType.split('/').pop() || 'bin'
|
|
150
|
+
return { bytes: Buffer.from(parts[1], 'base64'), mimeType, filename: `audio.${ext}` }
|
|
151
|
+
}
|
|
152
|
+
throw typedError(
|
|
153
|
+
`unsupported audio URI scheme: ${fileUri.slice(0, 32)}…`,
|
|
154
|
+
'SESSION_INVALID_AUDIO',
|
|
155
|
+
false
|
|
156
|
+
)
|
|
157
|
+
}
|
|
158
|
+
|
|
159
|
+
function fromHttpStatus (status, message, detail) {
|
|
160
|
+
const typed = classifyProviderError({ status })
|
|
161
|
+
// Keep the classifier's message (stable/machine-readable); response
|
|
162
|
+
// body snippets go to `detail` only (F45).
|
|
163
|
+
return typedError(typed.message, typed.type, typed.retryable, detail ? `${message}: ${detail}` : message)
|
|
164
|
+
}
|
|
165
|
+
|
|
166
|
+
function typedError (message, type, retryable, detail) {
|
|
167
|
+
const err = new Error(message)
|
|
168
|
+
const typed = { message, severity: retryable ? 'warn' : 'error', retryable, type }
|
|
169
|
+
if (detail) typed.detail = detail
|
|
170
|
+
err.typed = typed
|
|
171
|
+
return err
|
|
172
|
+
}
|
|
173
|
+
|
|
174
|
+
/** @param {unknown} e */
|
|
175
|
+
function messageOf (e) {
|
|
176
|
+
return e instanceof Error ? e.message : String(e)
|
|
177
|
+
}
|
|
@@ -0,0 +1,64 @@
|
|
|
1
|
+
/**
|
|
2
|
+
* Dispatch a TranscriptionEnvelope to the matching transcription
|
|
3
|
+
* adapter.
|
|
4
|
+
*
|
|
5
|
+
* Transcription is a single request/response — no streaming — so
|
|
6
|
+
* this returns a Promise of `TranscriptionResult` rather than an
|
|
7
|
+
* event generator. On adapter failure, the resolved error is a
|
|
8
|
+
* `TypedError` (structured, serializable) rather than a thrown JS
|
|
9
|
+
* `Error`.
|
|
10
|
+
*
|
|
11
|
+
* Like the image path, transcription skips rate-limit and cooldown —
|
|
12
|
+
* low-frequency one-shots that don't justify the per-call tracking
|
|
13
|
+
* overhead.
|
|
14
|
+
*
|
|
15
|
+
* @module session/run_transcription
|
|
16
|
+
*/
|
|
17
|
+
|
|
18
|
+
import { getTranscriptionAdapter } from './adapters/transcription/index.js'
|
|
19
|
+
import { classifyProviderError } from './adapters/_errors.js'
|
|
20
|
+
import { providerOf } from '#core/model-id.js'
|
|
21
|
+
|
|
22
|
+
/**
|
|
23
|
+
* @param {import('#core/transcription.js').TranscriptionEnvelope} envelope
|
|
24
|
+
* @param {{
|
|
25
|
+
* resolveAdapter?: (provider: string) => (
|
|
26
|
+
* env: import('#core/transcription.js').TranscriptionEnvelope,
|
|
27
|
+
* deps?: any
|
|
28
|
+
* ) => Promise<import('#core/transcription.js').TranscriptionResult>,
|
|
29
|
+
* spec?: any
|
|
30
|
+
* }} [options]
|
|
31
|
+
* @returns {Promise<
|
|
32
|
+
* | {ok: true, result: import('#core/transcription.js').TranscriptionResult}
|
|
33
|
+
* | {ok: false, error: import('#core/errors.js').TypedError}
|
|
34
|
+
* >}
|
|
35
|
+
*/
|
|
36
|
+
export async function runTranscription (envelope, { resolveAdapter = getTranscriptionAdapter, spec } = {}) {
|
|
37
|
+
let adapter
|
|
38
|
+
try {
|
|
39
|
+
adapter = resolveAdapter(providerOf(envelope.model))
|
|
40
|
+
} catch (e) {
|
|
41
|
+
return {
|
|
42
|
+
ok: false,
|
|
43
|
+
error: {
|
|
44
|
+
message: messageOf(e),
|
|
45
|
+
severity: 'error',
|
|
46
|
+
retryable: false,
|
|
47
|
+
type: 'SESSION_UNKNOWN_PROVIDER'
|
|
48
|
+
}
|
|
49
|
+
}
|
|
50
|
+
}
|
|
51
|
+
|
|
52
|
+
try {
|
|
53
|
+
const result = await adapter(envelope, spec ? { spec } : {})
|
|
54
|
+
return { ok: true, result }
|
|
55
|
+
} catch (e) {
|
|
56
|
+
const typed = /** @type {any} */(e).typed || classifyProviderError(e, envelope.auth?.key)
|
|
57
|
+
return { ok: false, error: typed }
|
|
58
|
+
}
|
|
59
|
+
}
|
|
60
|
+
|
|
61
|
+
/** @param {unknown} e */
|
|
62
|
+
function messageOf (e) {
|
|
63
|
+
return e instanceof Error ? e.message : String(e)
|
|
64
|
+
}
|
package/package.json
CHANGED
|
@@ -1,12 +1,12 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "mohdel",
|
|
3
|
-
"version": "0.
|
|
3
|
+
"version": "0.111.0",
|
|
4
4
|
"license": "MIT",
|
|
5
5
|
"author": {
|
|
6
6
|
"name": "Christophe Le Bars",
|
|
7
7
|
"email": "clb@toort.net"
|
|
8
8
|
},
|
|
9
|
-
"description": "Self-hosted LLM gateway
|
|
9
|
+
"description": "Self-hosted LLM gateway and SDK for Node — a LiteLLM-style unified API for 11 providers (Anthropic, OpenAI, Gemini, Mistral, Groq, xAI, DeepSeek, OpenRouter, …) with per-call USD cost tracking, streaming, tool calls, vision, speech-to-text, and built-in OpenTelemetry. Run in-process, or behind the process-isolated thin-gate for fault containment.",
|
|
10
10
|
"type": "module",
|
|
11
11
|
"repository": {
|
|
12
12
|
"type": "git",
|
|
@@ -87,10 +87,10 @@
|
|
|
87
87
|
"@opentelemetry/exporter-trace-otlp-grpc": "^0.218.0",
|
|
88
88
|
"@opentelemetry/sdk-node": "^0.218.0",
|
|
89
89
|
"chalk": "^5.4.0",
|
|
90
|
-
"mohdel-thin-gate-linux-x64-gnu": "0.
|
|
90
|
+
"mohdel-thin-gate-linux-x64-gnu": "0.111.0"
|
|
91
91
|
},
|
|
92
92
|
"dependencies": {
|
|
93
|
-
"@anthropic-ai/sdk": "^0.
|
|
93
|
+
"@anthropic-ai/sdk": "^0.104.1",
|
|
94
94
|
"@cerebras/cerebras_cloud_sdk": "^1.61.1",
|
|
95
95
|
"@google/genai": "^2.8.0",
|
|
96
96
|
"@opentelemetry/api": "^1.9.1",
|
package/src/cli/ask.js
CHANGED
|
@@ -5,8 +5,8 @@ const noop = () => {}
|
|
|
5
5
|
|
|
6
6
|
// Friendly next-step hints for common ask-time failures. Pure pattern match on
|
|
7
7
|
// err.message — keeps the lib layer neutral, but gives CLI users a copy-pasteable
|
|
8
|
-
// command instead of just an error.
|
|
9
|
-
const hintsForError = (err, modelId) => {
|
|
8
|
+
// command instead of just an error. Shared with `mo transcribe`.
|
|
9
|
+
export const hintsForError = (err, modelId) => {
|
|
10
10
|
const msg = String(err?.message || '')
|
|
11
11
|
const detail = String(err?.detail || '')
|
|
12
12
|
const both = `${msg}\n${detail}`
|
package/src/cli/index.js
CHANGED
|
@@ -55,6 +55,7 @@ Commands:
|
|
|
55
55
|
ratelimit provider rm <p> Remove provider-level limits
|
|
56
56
|
|
|
57
57
|
ask <provider/model> [prompt] One-shot inference (pipeable)
|
|
58
|
+
transcribe <provider/model> <file> Speech → text from an audio file
|
|
58
59
|
|
|
59
60
|
default Set default model (interactive)
|
|
60
61
|
doctor Check that your install is wired up
|
|
@@ -133,6 +134,9 @@ if (resolved === 'default') {
|
|
|
133
134
|
} else if (resolved === 'ask') {
|
|
134
135
|
const { runAsk } = await import('./ask.js')
|
|
135
136
|
await runAsk(resolvedArgs)
|
|
137
|
+
} else if (resolved === 'transcribe') {
|
|
138
|
+
const { runTranscribe } = await import('./transcribe.js')
|
|
139
|
+
await runTranscribe(resolvedArgs)
|
|
136
140
|
} else if (resolved === 'model') {
|
|
137
141
|
const { runModel } = await import('./model.js')
|
|
138
142
|
await runModel(resolvedArgs)
|
package/src/cli/model.js
CHANGED
|
@@ -269,7 +269,7 @@ Examples:
|
|
|
269
269
|
Required fields (asked if not pre-filled):
|
|
270
270
|
model the literal id sent to the provider's API
|
|
271
271
|
creator who trained the model (e.g. anthropic, openai, alibaba)
|
|
272
|
-
inputFormat subset of [text, image, video]
|
|
272
|
+
inputFormat subset of [text, image, video, audio]
|
|
273
273
|
|
|
274
274
|
See docs/CATALOG.md for the full field reference, and
|
|
275
275
|
config/curated.example.json for ready-to-copy entries.`)
|
|
@@ -0,0 +1,145 @@
|
|
|
1
|
+
import { resolve, extname } from 'node:path'
|
|
2
|
+
|
|
3
|
+
import mohdel, { silent } from '../lib/index.js'
|
|
4
|
+
import { loadDefaultEnv } from '../lib/common.js'
|
|
5
|
+
import { hintsForError } from './ask.js'
|
|
6
|
+
|
|
7
|
+
const noop = () => {}
|
|
8
|
+
|
|
9
|
+
const MIME_BY_EXT = {
|
|
10
|
+
'.mp3': 'audio/mpeg',
|
|
11
|
+
'.mpga': 'audio/mpeg',
|
|
12
|
+
'.m4a': 'audio/mp4',
|
|
13
|
+
'.mp4': 'audio/mp4',
|
|
14
|
+
'.wav': 'audio/wav',
|
|
15
|
+
'.webm': 'audio/webm',
|
|
16
|
+
'.flac': 'audio/flac',
|
|
17
|
+
'.ogg': 'audio/ogg',
|
|
18
|
+
'.opus': 'audio/opus'
|
|
19
|
+
}
|
|
20
|
+
|
|
21
|
+
export async function runTranscribe (args) {
|
|
22
|
+
if (args.includes('-h') || args.includes('--help')) {
|
|
23
|
+
console.log(`mohdel transcribe — speech → text, pipeable
|
|
24
|
+
|
|
25
|
+
Usage:
|
|
26
|
+
mo transcribe <model> <audio-file>
|
|
27
|
+
|
|
28
|
+
Options:
|
|
29
|
+
--language <iso> ISO-639-1 language hint (e.g. en, fr)
|
|
30
|
+
--prompt <text> Spelling/context hint forwarded to the provider
|
|
31
|
+
--mime <type> Override the MIME type guessed from the extension
|
|
32
|
+
--json Output full result as JSON
|
|
33
|
+
-v, --verbose Show debug info on stderr
|
|
34
|
+
|
|
35
|
+
Output:
|
|
36
|
+
stdout: transcript text (raw — or JSON with --json)
|
|
37
|
+
stderr: model name + duration/cost summary
|
|
38
|
+
|
|
39
|
+
Known extensions: ${Object.keys(MIME_BY_EXT).join(' ')}
|
|
40
|
+
|
|
41
|
+
Examples:
|
|
42
|
+
mo transcribe groq/whisper-large-v3-turbo meeting.mp3
|
|
43
|
+
mo transcribe mistral/voxtral-mini-transcribe interview.wav --language fr
|
|
44
|
+
mo transcribe groq/whisper-large-v3-turbo memo.m4a --json | jq .cost`)
|
|
45
|
+
process.exit(0)
|
|
46
|
+
}
|
|
47
|
+
|
|
48
|
+
loadDefaultEnv()
|
|
49
|
+
|
|
50
|
+
const flagVal = (name) => {
|
|
51
|
+
const idx = args.indexOf(name)
|
|
52
|
+
if (idx === -1) return undefined
|
|
53
|
+
const val = args[idx + 1]
|
|
54
|
+
args.splice(idx, 2)
|
|
55
|
+
return val
|
|
56
|
+
}
|
|
57
|
+
const flag = (name) => {
|
|
58
|
+
const idx = args.indexOf(name)
|
|
59
|
+
if (idx === -1) return false
|
|
60
|
+
args.splice(idx, 1)
|
|
61
|
+
return true
|
|
62
|
+
}
|
|
63
|
+
|
|
64
|
+
const json = flag('--json')
|
|
65
|
+
const verbose = flag('--verbose') || flag('-v')
|
|
66
|
+
const language = flagVal('--language')
|
|
67
|
+
const prompt = flagVal('--prompt')
|
|
68
|
+
const mimeOverride = flagVal('--mime')
|
|
69
|
+
|
|
70
|
+
const [modelId, file] = args
|
|
71
|
+
if (!modelId || !file) {
|
|
72
|
+
console.error('Usage: mo transcribe <model> <audio-file>')
|
|
73
|
+
process.exit(1)
|
|
74
|
+
}
|
|
75
|
+
|
|
76
|
+
const mimeType = mimeOverride || MIME_BY_EXT[extname(file).toLowerCase()]
|
|
77
|
+
if (!mimeType) {
|
|
78
|
+
console.error(`Unknown audio extension '${extname(file)}'. Pass --mime <type> (e.g. --mime audio/mpeg).`)
|
|
79
|
+
process.exit(1)
|
|
80
|
+
}
|
|
81
|
+
|
|
82
|
+
const log = verbose ? (...args) => process.stderr.write(`${args.map(a => typeof a === 'string' ? a : JSON.stringify(a)).join(' ')}\n`) : noop
|
|
83
|
+
const logger = {
|
|
84
|
+
...silent,
|
|
85
|
+
debug: verbose ? log : noop,
|
|
86
|
+
info: log,
|
|
87
|
+
warn: log,
|
|
88
|
+
error: log,
|
|
89
|
+
fatal: log
|
|
90
|
+
}
|
|
91
|
+
const mo = await mohdel({ logger })
|
|
92
|
+
|
|
93
|
+
let model
|
|
94
|
+
try {
|
|
95
|
+
model = mo.use(modelId)
|
|
96
|
+
} catch (err) {
|
|
97
|
+
console.error(err.message)
|
|
98
|
+
for (const h of hintsForError(err, modelId)) console.error(h)
|
|
99
|
+
process.exit(1)
|
|
100
|
+
}
|
|
101
|
+
|
|
102
|
+
const options = {}
|
|
103
|
+
if (language) options.language = language
|
|
104
|
+
if (prompt) options.prompt = prompt
|
|
105
|
+
|
|
106
|
+
process.stderr.write(`${model.id}\n`)
|
|
107
|
+
|
|
108
|
+
try {
|
|
109
|
+
const result = await model.transcribe(
|
|
110
|
+
{ fileUri: `file://${resolve(file)}`, mimeType },
|
|
111
|
+
options
|
|
112
|
+
)
|
|
113
|
+
|
|
114
|
+
if (json) {
|
|
115
|
+
console.log(JSON.stringify({
|
|
116
|
+
model: model.id,
|
|
117
|
+
text: result.text,
|
|
118
|
+
language: result.language,
|
|
119
|
+
durationSeconds: result.durationSeconds,
|
|
120
|
+
inputTokens: result.inputTokens,
|
|
121
|
+
outputTokens: result.outputTokens,
|
|
122
|
+
cost: result.cost ?? null,
|
|
123
|
+
status: result.status
|
|
124
|
+
}, null, 2))
|
|
125
|
+
} else {
|
|
126
|
+
process.stdout.write(result.text)
|
|
127
|
+
if (result.text && !result.text.endsWith('\n')) process.stdout.write('\n')
|
|
128
|
+
}
|
|
129
|
+
|
|
130
|
+
const summary = []
|
|
131
|
+
if (result.durationSeconds != null) summary.push(`${result.durationSeconds}s audio`)
|
|
132
|
+
if (result.inputTokens) summary.push(`${result.inputTokens} in`)
|
|
133
|
+
if (result.outputTokens) summary.push(`${result.outputTokens} out`)
|
|
134
|
+
if (result.cost != null) summary.push(`$${result.cost.toFixed(4)}`)
|
|
135
|
+
const ts = result.timestamps
|
|
136
|
+
if (ts?.start && ts?.end) {
|
|
137
|
+
summary.push(`${Math.round(Number(BigInt(ts.end) - BigInt(ts.start)) / 1e6)}ms total`)
|
|
138
|
+
}
|
|
139
|
+
if (summary.length) process.stderr.write(`${summary.join(', ')}\n`)
|
|
140
|
+
} catch (err) {
|
|
141
|
+
console.error(`Error: ${err.detail || err.message}`)
|
|
142
|
+
for (const h of hintsForError(err, modelId)) console.error(h)
|
|
143
|
+
process.exit(1)
|
|
144
|
+
}
|
|
145
|
+
}
|
package/src/lib/index.js
CHANGED
|
@@ -14,7 +14,7 @@ import {
|
|
|
14
14
|
} from './curated-cache.js'
|
|
15
15
|
import { createRateLimiter } from '../../js/session/_rate_limiter.js'
|
|
16
16
|
import { createCooldownTracker } from '../../js/session/_cooldown.js'
|
|
17
|
-
import { runAnswer, runAnswerImage } from '../../js/factory/bridge.js'
|
|
17
|
+
import { runAnswer, runAnswerImage, runAnswerTranscription } from '../../js/factory/bridge.js'
|
|
18
18
|
import { startSpan, endSpanOk, endSpanError } from './tracing.js'
|
|
19
19
|
import { isValidTag } from './schema.js'
|
|
20
20
|
import { silent } from './logger.js'
|
|
@@ -644,6 +644,20 @@ const createModelProxy = (resolvedModelId, modelSpec, handlers, aliasOutputEffor
|
|
|
644
644
|
}
|
|
645
645
|
}
|
|
646
646
|
|
|
647
|
+
if (prop === 'transcribe') {
|
|
648
|
+
return async (audio, options = {}) => {
|
|
649
|
+
const { configuration } = await getRuntime()
|
|
650
|
+
return runAnswerTranscription({
|
|
651
|
+
provider: modelSpec.provider,
|
|
652
|
+
model: modelSpec.model ?? resolvedModelId.split('/').pop(),
|
|
653
|
+
configuration,
|
|
654
|
+
audio,
|
|
655
|
+
options,
|
|
656
|
+
spec: modelSpec
|
|
657
|
+
})
|
|
658
|
+
}
|
|
659
|
+
}
|
|
660
|
+
|
|
647
661
|
if (prop === 'setRateLimit') {
|
|
648
662
|
return async ({ rpm, tpm } = {}) => {
|
|
649
663
|
const curatedCache = getCuratedCacheSnapshot()
|
package/src/lib/schema.js
CHANGED
|
@@ -27,6 +27,7 @@ const fieldDefs = {
|
|
|
27
27
|
imagePrice: { type: 'number' },
|
|
28
28
|
imageEndpoint: { type: 'string' },
|
|
29
29
|
imageDefaultSize: { type: 'string' },
|
|
30
|
+
transcriptionPrice: { type: 'number' },
|
|
30
31
|
deprecated: { type: 'string' },
|
|
31
32
|
suspended: { type: 'string' },
|
|
32
33
|
rpmLimit: { type: 'number' },
|