npm - mohdel - Versions diffs - 0.110.0 → 0.111.0 - Mend

mohdel 0.110.0 → 0.111.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (17) hide show

package/README.md +57 -19
package/config/curated.example.json +12 -0
package/config/curated.schema.json +4 -3
package/js/core/transcription.js +70 -0
package/js/factory/bridge.js +34 -0
package/js/session/adapters/_pricing.js +32 -0
package/js/session/adapters/transcription/fake.js +53 -0
package/js/session/adapters/transcription/index.js +58 -0
package/js/session/adapters/transcription/openai_compatible.js +177 -0
package/js/session/run_transcription.js +64 -0
package/package.json +4 -4
package/src/cli/ask.js +2 -2
package/src/cli/index.js +4 -0
package/src/cli/model.js +1 -1
package/src/cli/transcribe.js +145 -0
package/src/lib/index.js +15 -1
package/src/lib/schema.js +1 -0

package/README.md CHANGED Viewed

@@ -1,6 +1,6 @@
 # Mohdel
-One Node API and one CLI for 11 LLM providers — call any model with the same `answer()` shape, get tokens and per-call USD cost back, swap models by changing one string. Self-hosted: your keys, your infra, no SaaS proxy in the path.
+Self-hosted LLM gateway and SDK for Node — think LiteLLM, for the JS world. One `answer()` call for 11 providers; swap models by changing one string; get real per-call USD cost back on every result, with OpenTelemetry built in and process isolation when you need it. Your keys, your infra, no SaaS proxy in the path.
 ```bash
 npm install -g mohdel
@@ -12,15 +12,49 @@ Providers: Anthropic, OpenAI, Gemini, Mistral, Groq, xAI, Cerebras, Fireworks, D
 ## Why mohdel
+- **Real numbers on every call.** Token counts and per-call USD cost computed from your own pricing catalog (`curated.json`) — not estimates, not provider-specific shapes. Bill tenants, alert on spend, reconcile invoices. See [docs/CATALOG.md](docs/CATALOG.md) for the catalog format.
 - **One interface across providers.** Same `answer()` call, same event stream, same `{ status, output, inputTokens, outputTokens, cost }` result. Switching from `anthropic/claude-sonnet-4-6` to `openai/gpt-5.4-mini` is one string change — adapter differences stay inside mohdel.
-- **Real numbers on every call.** Token counts and per-call USD cost computed from your own pricing catalog (`curated.json`) — not estimates, not provider-specific shapes. See [docs/CATALOG.md](docs/CATALOG.md) for the catalog format.
+- **Self-hosted, no vendor in the path.** API keys live in `~/.config/mohdel/`. Mohdel calls provider APIs directly; nothing routes through a third party, nothing marks up your tokens, no extra hop of availability risk.
 - **Observability without instrumentation.** OpenTelemetry spans, trace-linked logs, and OTLP metrics over one endpoint. Set `OTEL_EXPORTER_OTLP_ENDPOINT`; everything else is wired.
 - **Two integration paths, same API.** In-process factory for CLI tools, scripts, single-process services. Optional `thin-gate` subprocess for fault isolation, cross-process quota, and any-language HTTP callers — no code change to switch.
-- **Self-hosted, no vendor in the path.** API keys live in `~/.config/mohdel/`. Mohdel calls provider APIs directly; nothing routes through a third party.
+## How it compares
+The one-paragraph version: **LiteLLM** is the closest analog but lives in
+Python; **Vercel AI SDK** is an application toolkit, not an infra layer;
+**OpenRouter** is the same one-API promise as a SaaS in your request path;
+**raw provider SDKs** are N different shapes with no cost accounting.
+|  | mohdel | LiteLLM | Vercel AI SDK | OpenRouter | Raw SDKs |
+|---|---|---|---|---|---|
+| Runs in a Node stack natively | yes | Python service | yes | n/a (SaaS) | yes |
+| Per-call USD cost on the result | yes | yes | no | yes | no |
+| Self-hosted, keys never leave your infra | yes | yes | yes | no | yes |
+| Provider-SDK process isolation | yes (thin-gate) | proxy only | no | n/a | no |
+| OTel spans + metrics out of the box | yes | via callbacks | no | no | no |
+| UI streaming helpers, structured output, agents | no — by design | no | yes | no | varies |
+- **vs LiteLLM** — same core promise (unified calls, cost tracking,
+  self-hosted gateway), but Node-native: if your stack is JS, there's no
+  Python sidecar to deploy, version, and monitor. The honest gap: LiteLLM's
+  proxy exposes an OpenAI-compatible endpoint and admin features (virtual
+  keys, budgets); thin-gate speaks its own [wire protocol](PROTOCOL.md) —
+  callers use the JS client or implement the protocol.
+- **vs Vercel AI SDK** — different layer, not a rival. The AI SDK is an
+  application toolkit (UI streaming, structured outputs, agent loops) with no
+  per-call cost, no gateway, no process isolation. Use it *above* mohdel if
+  you like it — mohdel is the inference primitive underneath.
+- **vs OpenRouter** — the self-hosted version of the same idea. With a SaaS
+  router you accept their uptime, their markup, and your prompts transiting
+  their infra. Mohdel goes direct to providers with your keys — and ships an
+  `openrouter` adapter for when you want both.
+- **vs raw provider SDKs** — no abstraction tax to escape later: mohdel's
+  envelope is flat and close to the SDKs underneath, and `cost`/`tokens`
+  come back normalized so you never parse five different usage shapes.
 ## Documentation
-- [INTEGRATION.md](INTEGRATION.md) — JS library guide (factory, client, answer options, tools, streaming, vision, errors, OTel)
+- [INTEGRATION.md](INTEGRATION.md) — JS library guide (factory, client, answer options, tools, streaming, vision, transcription, errors, OTel)
 - [docs/COOKBOOK.md](docs/COOKBOOK.md) — copy-paste recipes (summarize a file, stream, swap providers, tools, vision, batch + cost)
 - [docs/CATALOG.md](docs/CATALOG.md) — `curated.json` walkthrough with worked examples
 - [docs/GLOSSARY.md](docs/GLOSSARY.md) — short definitions for envelope, thin-gate, session, creator vs provider, status, …
@@ -67,6 +101,10 @@ mo ask anthropic/claude-sonnet-4-6 --stream "write a haiku about recursion"
 # With thinking effort
 mo ask anthropic/claude-opus-4-6 --effort high "prove P != NP"
+# Speech → text from an audio file
+mo transcribe groq/whisper-large-v3-turbo meeting.mp3
+mo transcribe mistral/voxtral-mini-transcribe interview.wav --language fr
 # Browse the model catalog
 mo ls                                  # list all curated models
 mo ls --sort price                     # sorted by input price
@@ -101,9 +139,21 @@ All list/show commands support `--json [fields]` — bare `--json` lists availab
 ## Library Usage
-Two integration paths: the **client** (primary, cross-process) and the **factory** (in-process shortcut).
+Two integration paths, same adapters underneath: start with the in-process **factory**; graduate to the cross-process **client** when you want gateway-grade isolation.
+### Factory — in-process (start here)
+```js
+import mohdel from 'mohdel'
+const mo = await mohdel()
+const result = await mo.use('anthropic/claude-sonnet-4-6').answer('Hello')
+console.log(result.output, result.cost)
+```
+No subprocess, no setup beyond your API key. Right for CLI tools (`mo ask`), scripts, tests, and single-process services — which is most projects.
-### Client — cross-process (recommended)
+### Client — cross-process (the production gateway)
 ```js
 import { call } from 'mohdel/client'
@@ -119,19 +169,7 @@ for await (const ev of call(envelope, { socketPath: '/tmp/mohdel-data.sock' }))
 }
 ```
-Requires a running `thin-gate` subprocess. See [INTEGRATION.md §Client](INTEGRATION.md#client-cross-process--primary-production-integration) for setup.
-### Factory — in-process shortcut
-```js
-import mohdel from 'mohdel'
-const mo = await mohdel()
-const result = await mo.use('anthropic/claude-sonnet-4-6').answer('Hello')
-console.log(result.output, result.cost)
-```
-No subprocess; the factory runs the same session adapters inline. Right for CLI (`mo ask`), scripts, tests, single-process services.
+Same API, but inference runs in a pooled subprocess behind the `thin-gate` supervisor (Rust): a crashing provider SDK can't take your service down, quota is enforced across processes, and non-JS callers can speak the same wire. Switching from factory to client is a configuration change, not a rewrite. See [INTEGRATION.md §Client](INTEGRATION.md#client-cross-process--primary-production-integration) for setup.
 For the full API — initialization, alias resolution, answer options, response shape, tool use, streaming, vision, error handling, OpenTelemetry, sub-path exports — see **[INTEGRATION.md](INTEGRATION.md)**.

package/config/curated.example.json CHANGED Viewed

@@ -61,6 +61,18 @@
     "tags": ["image"]
   },
+  "groq/whisper-large-v3-turbo": {
+    "_comment_f": "Transcription entry. type:'transcription' selects the speech-to-text dispatcher (multipart upload, no streaming). transcriptionPrice is per audio MINUTE — Groq's $0.04/hour ≈ $0.000667/min. Token-billed models (openai/gpt-4o-mini-transcribe) use inputPrice/outputPrice instead.",
+    "model": "whisper-large-v3-turbo",
+    "creator": "openai",
+    "provider": "groq",
+    "label": "Whisper Large v3 Turbo",
+    "inputFormat": ["audio"],
+    "type": "transcription",
+    "transcriptionPrice": 0.000667,
+    "tags": ["transcription", "fast"]
+  },
   "openai/gpt-5.4-mini": {
     "_comment_e": "Entry with custom rate limits. rpmLimit and tpmLimit override the provider-level defaults in providers.json. rateLimitScope:'model' means the limit is per-model; 'provider' means it joins the provider-level pool.",
     "model": "gpt-5.4-mini",

package/config/curated.schema.json CHANGED Viewed

@@ -45,7 +45,7 @@
         "inputFormat": {
           "type": "array",
           "description": "Accepted input modalities. Defaults to ['text'].",
-          "items": { "type": "string", "enum": ["text", "image", "video"] },
+          "items": { "type": "string", "enum": ["text", "image", "video", "audio"] },
           "minItems": 1,
           "uniqueItems": true
         },
@@ -56,9 +56,9 @@
         "sdk": { "type": "string", "description": "SDK adapter to use (some providers require this)." },
         "type": {
           "type": "string",
-          "enum": ["model", "image"],
+          "enum": ["model", "image", "transcription"],
           "default": "model",
-          "description": "'model' for chat/completion, 'image' for image generation."
+          "description": "'model' for chat/completion, 'image' for image generation, 'transcription' for speech-to-text."
         },
         "label": { "type": "string", "description": "Human-readable name shown in UIs." },
         "description": { "type": "string" },
@@ -137,6 +137,7 @@
         "imagePrice": { "type": "number", "minimum": 0, "description": "USD per generated image (image-type entries)." },
         "imageEndpoint": { "type": "string", "description": "Provider-side image endpoint name." },
         "imageDefaultSize": { "type": "string", "description": "Default size when envelope omits one (e.g. '1024x1024')." },
+        "transcriptionPrice": { "type": "number", "minimum": 0, "description": "USD per audio minute (transcription-type entries). Token-billed transcription models (OpenAI gpt-4o-*-transcribe) use inputPrice/outputPrice instead." },
         "rpmLimit": { "type": "integer", "minimum": 1, "description": "Requests per minute. Overrides provider default." },
         "tpmLimit": { "type": "integer", "minimum": 1, "description": "Tokens per minute. Overrides provider default." },

package/js/core/transcription.js ADDED Viewed

@@ -0,0 +1,70 @@
+/**
+ * Transcription (voice → text) envelope and result.
+ *
+ * Separate call path from `CallEnvelope` / `AnswerResult`: transcription
+ * is a single synchronous request/response (no streaming) against a
+ * provider's `/audio/transcriptions` endpoint.
+ * Result shape: `{ status, text, language, durationSeconds, cost, timestamps }`.
+ *
+ * @module core/transcription
+ */
+/**
+ * @typedef {object} AudioRef
+ * @property {string} fileUri
+ *   `file://` or `data:` URI. Remote `https://` audio is not
+ *   supported — providers require multipart upload, so the caller
+ *   owns the download.
+ * @property {string} mimeType  e.g. "audio/mpeg", "audio/wav".
+ */
+/**
+ * @typedef {object} TranscriptionEnvelope
+ *
+ * @property {string} callId
+ * @property {string} authId
+ * @property {import('./envelope.js').Auth} auth
+ * @property {string} [traceparent]
+ * @property {string} [baggage]
+ *
+ * @property {import('./model-id.js').ModelId} model
+ *   Full mohdel id — `"<provider>/<bare>"`. Same shape as
+ *   `CallEnvelope.model` (see `envelope.js`).
+ * @property {AudioRef} audio
+ *
+ * @property {string} [language]  ISO-639-1 hint (e.g. "en", "fr").
+ * @property {string} [prompt]    Spelling/context hint forwarded to the provider.
+ */
+/**
+ * @typedef {object} TranscriptionResult
+ *
+ * @property {'completed'} status
+ *   Transcriptions are one-shot — no `incomplete` state.
+ * @property {string} text
+ * @property {string | null} language
+ *   Detected (or echoed) language when the provider reports one.
+ * @property {number | null} durationSeconds
+ *   Audio duration as reported by the provider; null when not reported.
+ * @property {number} [inputTokens]
+ *   Present only for token-billed providers (OpenAI gpt-4o-*-transcribe).
+ * @property {number} [outputTokens]
+ * @property {number} cost
+ *   USD. `transcriptionPrice` (per audio minute) × duration when the
+ *   provider reports duration; token pricing fallback otherwise; 0 when
+ *   the spec carries no usable price.
+ * @property {{start: string, first: string, end: string}} timestamps
+ *   hrtime-bigint-as-string. `first` = `end` (no streaming).
+ */
+export const TRANSCRIPTION_ENVELOPE_FIELDS = Object.freeze([
+  'callId',
+  'authId',
+  'auth',
+  'traceparent',
+  'baggage',
+  'model',
+  'audio',
+  'language',
+  'prompt'
+])

package/js/factory/bridge.js CHANGED Viewed

@@ -22,6 +22,7 @@
 import { run } from '../session/run.js'
 import { runImage } from '../session/run_image.js'
+import { runTranscription } from '../session/run_transcription.js'
 import { MohdelError, Severity } from '../../src/lib/errors.js'
 import { createRealtimeDeltaBuffer } from '../../src/lib/utils.js'
@@ -149,6 +150,39 @@ export async function runAnswerImage ({ provider, model, configuration, prompt,
   return out.result
 }
+/**
+ * Run a `transcribe()` call through the /session runtime.
+ *
+ * @param {object} args
+ * @param {string} args.provider
+ * @param {string} args.model
+ * @param {any} args.configuration
+ * @param {{fileUri: string, mimeType: string}} args.audio
+ * @param {any} [args.options]    `language` / `prompt` map onto the
+ *                                envelope; `callId` / `authId` are
+ *                                transport metadata.
+ * @param {any} [args.spec]       modelSpec passthrough so the adapter
+ *                                picks up `model` and
+ *                                `transcriptionPrice` without
+ *                                re-reading the catalog.
+ * @returns {Promise<any>}
+ */
+export async function runAnswerTranscription ({ provider, model, configuration, audio, options = {}, spec }) {
+  const envelope = {
+    callId: options.callId || newCallId(),
+    authId: options.authId || 'local',
+    auth: configToAuth(configuration),
+    model: `${provider}/${model}`,
+    audio
+  }
+  if (options.language) envelope.language = options.language
+  if (options.prompt) envelope.prompt = options.prompt
+  const out = await runTranscription(envelope, spec ? { spec } : {})
+  if (!out.ok) throw fromTypedError(out.error, { provider, model })
+  return out.result
+}
 /**
  * @param {object} args
  * @param {string} args.modelKey      Mohdel catalog key `<provider>/<bare>`. The

package/js/session/adapters/_pricing.js CHANGED Viewed

@@ -100,6 +100,38 @@ export function costFor (model, usage) {
   return computeCost(getSpec(model), usage)
 }
+/**
+ * Cost of a transcription call.
+ *
+ * Providers bill speech-to-text two ways, and the catalog supports both:
+ *
+ *   - `transcriptionPrice` — flat USD per audio **minute** (Groq,
+ *     Mistral; the industry quoting unit). Used when the provider
+ *     reported the audio duration.
+ *   - token pricing (`inputPrice`/`outputPrice`) — OpenAI's
+ *     gpt-4o-*-transcribe models report token usage instead of
+ *     duration; falls through to `computeCost`.
+ *
+ * Duration wins when both are available. Unknown models or specs
+ * without prices return `0` — same graceful degradation as
+ * `computeCost`.
+ *
+ * @param {any} spec  Catalog entry, or `undefined`.
+ * @param {{durationSeconds?: number | null, inputTokens?: number, outputTokens?: number}} usage
+ * @returns {number}
+ */
+export function computeTranscriptionCost (spec, usage) {
+  if (!spec) return 0
+  const seconds = usage.durationSeconds
+  if (typeof seconds === 'number' && seconds > 0 && typeof spec.transcriptionPrice === 'number') {
+    return round((seconds / 60) * spec.transcriptionPrice)
+  }
+  if (usage.inputTokens || usage.outputTokens) {
+    return computeCost(spec, { inputTokens: usage.inputTokens, outputTokens: usage.outputTokens })
+  }
+  return 0
+}
 /**
  * Test convenience: inject pricing-only specs by model id. Wraps
  * `setCatalog` with the `{input, output, thinking?}` shape used in

package/js/session/adapters/transcription/fake.js ADDED Viewed

@@ -0,0 +1,53 @@
+/**
+ * Fake transcription adapter — scenario-driven for tests and bug
+ * reproductions. Never calls a real API.
+ *
+ * Mirrors the `fake` image adapter shape: the envelope's `prompt`
+ * field carries a JSON scenario spec; the `mode` key picks a
+ * behavior. Missing / non-JSON prompts fall through to `mode: "ok"`.
+ *
+ * ## Modes
+ *
+ * | mode    | params                          | behavior                       |
+ * |---------|---------------------------------|--------------------------------|
+ * | `ok`    | `text?`, `durationSeconds?`     | returns a canned transcription |
+ * | `error` | `type`, `message`               | throws a tagged error          |
+ *
+ * @module session/adapters/transcription/fake
+ */
+/**
+ * @param {import('#core/transcription.js').TranscriptionEnvelope} envelope
+ * @returns {Promise<import('#core/transcription.js').TranscriptionResult>}
+ */
+export async function fakeTranscription (envelope) {
+  const scenario = parseScenario(envelope.prompt)
+  const mode = scenario.mode ?? 'ok'
+  if (mode === 'error') {
+    const err = new Error(scenario.message || 'fake transcription error')
+    err.typed = {
+      message: scenario.message || 'fake transcription error',
+      severity: 'error',
+      retryable: !!scenario.retryable,
+      type: scenario.type || 'PROVIDER_ERROR'
+    }
+    throw err
+  }
+  const now = `${process.hrtime.bigint()}`
+  return {
+    status: 'completed',
+    text: scenario.text ?? `fake transcript for ${envelope.callId}`,
+    language: scenario.language ?? 'en',
+    durationSeconds: scenario.durationSeconds ?? 1,
+    cost: 0,
+    timestamps: { start: now, first: now, end: now }
+  }
+}
+/** @param {unknown} prompt */
+function parseScenario (prompt) {
+  if (typeof prompt !== 'string') return {}
+  try { return JSON.parse(prompt) || {} } catch { return {} }
+}

package/js/session/adapters/transcription/index.js ADDED Viewed

@@ -0,0 +1,58 @@
+/**
+ * Transcription-adapter registry. Mirrors session/adapters/image but
+ * scoped to speech-to-text providers.
+ *
+ * Groq, OpenAI, and Mistral all expose the same OpenAI-compatible
+ * `POST /audio/transcriptions` multipart endpoint, so each entry is
+ * the shared adapter bound to per-provider knobs:
+ *
+ *   - `baseURL` — the provider's OpenAI-compatible API root.
+ *   - `responseFormat` — `verbose_json` where supported (returns
+ *     `duration`, needed for per-minute pricing). OpenAI's
+ *     gpt-4o-*-transcribe models reject `verbose_json` (plain `json`
+ *     returns token usage instead); Mistral rejects the field
+ *     entirely (its default response already carries
+ *     `usage.prompt_audio_seconds`).
+ *
+ * @module session/adapters/transcription
+ */
+import { createTranscriptionAdapter } from './openai_compatible.js'
+import { fakeTranscription } from './fake.js'
+const TRANSCRIPTION_ADAPTERS = {
+  groq: createTranscriptionAdapter({
+    baseURL: 'https://api.groq.com/openai/v1',
+    responseFormat: 'verbose_json'
+  }),
+  openai: createTranscriptionAdapter({
+    baseURL: 'https://api.openai.com/v1',
+    responseFormat: 'json'
+  }),
+  mistral: createTranscriptionAdapter({
+    baseURL: 'https://api.mistral.ai/v1'
+  }),
+  fake: fakeTranscription
+}
+/**
+ * @param {string} provider
+ * @returns {(
+ *   env: import('#core/transcription.js').TranscriptionEnvelope,
+ *   deps?: any
+ * ) => Promise<import('#core/transcription.js').TranscriptionResult>}
+ */
+export function getTranscriptionAdapter (provider) {
+  const adapter = TRANSCRIPTION_ADAPTERS[provider]
+  if (!adapter) throw new Error(`no transcription adapter for provider: ${provider}`)
+  return adapter
+}
+/**
+ * Whether the provider has a transcription adapter registered.
+ *
+ * @param {string} provider
+ */
+export function isTranscriptionProvider (provider) {
+  return Object.prototype.hasOwnProperty.call(TRANSCRIPTION_ADAPTERS, provider)
+}

package/js/session/adapters/transcription/openai_compatible.js ADDED Viewed

@@ -0,0 +1,177 @@
+/**
+ * Shared transcription adapter for OpenAI-compatible
+ * `POST <baseURL>/audio/transcriptions` endpoints (multipart upload).
+ *
+ * Groq, Mistral, and OpenAI all implement the same endpoint shape;
+ * only the base URL and the supported `response_format` differ, so
+ * one adapter covers all three. Per-provider knobs are bound via
+ * `createTranscriptionAdapter` in `./index.js`.
+ *
+ * Duration extraction (for per-minute pricing) is response-shape
+ * dependent:
+ *   - `body.duration`                    — whisper `verbose_json` (Groq)
+ *   - `body.usage.seconds`               — OpenAI duration-type usage
+ *   - `body.usage.prompt_audio_seconds`  — Mistral Voxtral
+ * OpenAI's gpt-4o-*-transcribe models report token usage instead;
+ * `computeTranscriptionCost` falls back to token pricing for those.
+ *
+ * @module session/adapters/transcription/openai_compatible
+ */
+import { readFile } from 'node:fs/promises'
+import { basename } from 'node:path'
+import { getSpec } from '../_catalog.js'
+import { classifyProviderError } from '../_errors.js'
+import { computeTranscriptionCost } from '../_pricing.js'
+import { catalogKey, bareOf } from '#core/model-id.js'
+/**
+ * @param {{baseURL: string, responseFormat?: string}} config
+ * @returns {(
+ *   env: import('#core/transcription.js').TranscriptionEnvelope,
+ *   deps?: {fetch?: typeof fetch, spec?: any}
+ * ) => Promise<import('#core/transcription.js').TranscriptionResult>}
+ */
+export function createTranscriptionAdapter ({ baseURL, responseFormat }) {
+  return async function transcription (envelope, deps = {}) {
+    const fetchFn = deps.fetch ?? globalThis.fetch
+    const spec = deps.spec ?? getSpec(catalogKey(envelope.model)) ?? {}
+    const start = String(process.hrtime.bigint())
+    const audio = await loadAudio(envelope.audio)
+    const form = new FormData()
+    form.append('model', spec.model ?? bareOf(envelope.model))
+    form.append('file', new Blob([audio.bytes], { type: audio.mimeType }), audio.filename)
+    if (responseFormat) form.append('response_format', responseFormat)
+    if (envelope.language) form.append('language', envelope.language)
+    if (envelope.prompt) form.append('prompt', envelope.prompt)
+    const root = (envelope.auth.baseURL || baseURL).replace(/\/$/, '')
+    let res
+    try {
+      res = await fetchFn(`${root}/audio/transcriptions`, {
+        method: 'POST',
+        headers: { Authorization: `Bearer ${envelope.auth.key}` },
+        body: form
+      })
+    } catch (e) {
+      throw typedError(classifyProviderError(e, envelope.auth?.key).message, 'NET_ERROR', true)
+    }
+    if (!res.ok) {
+      const text = await res.text().catch(() => '')
+      throw fromHttpStatus(res.status, 'transcription request failed', text.slice(0, 200))
+    }
+    const body = await res.json()
+    const durationSeconds = extractDuration(body)
+    const tokens = extractTokens(body)
+    const cost = computeTranscriptionCost(spec, { durationSeconds, ...tokens })
+    const end = String(process.hrtime.bigint())
+    return {
+      status: 'completed',
+      text: typeof body.text === 'string' ? body.text : '',
+      language: typeof body.language === 'string' ? body.language : null,
+      durationSeconds,
+      ...tokens,
+      cost,
+      timestamps: { start, first: end, end }
+    }
+  }
+}
+/** @param {any} body */
+function extractDuration (body) {
+  if (typeof body.duration === 'number') return body.duration
+  const u = body.usage
+  if (u && typeof u === 'object') {
+    if (typeof u.seconds === 'number') return u.seconds
+    if (typeof u.prompt_audio_seconds === 'number') return u.prompt_audio_seconds
+  }
+  return null
+}
+/** @param {any} body */
+function extractTokens (body) {
+  const u = body.usage
+  if (!u || typeof u !== 'object') return {}
+  const out = {}
+  if (typeof u.input_tokens === 'number') out.inputTokens = u.input_tokens
+  if (typeof u.output_tokens === 'number') out.outputTokens = u.output_tokens
+  return out
+}
+// Multipart filename drives format sniffing on the provider side, so
+// data: URIs need an extension synthesized from the MIME subtype.
+const EXT_BY_MIME = {
+  'audio/mpeg': 'mp3',
+  'audio/mp4': 'm4a',
+  'audio/x-m4a': 'm4a',
+  'audio/wav': 'wav',
+  'audio/x-wav': 'wav',
+  'audio/webm': 'webm',
+  'audio/flac': 'flac',
+  'audio/x-flac': 'flac',
+  'audio/ogg': 'ogg',
+  'audio/opus': 'opus'
+}
+/**
+ * `file://` and `data:` URIs only — providers require multipart
+ * upload, so remote `https://` audio would mean mohdel silently
+ * downloading arbitrary URLs; the caller owns that step.
+ *
+ * @param {import('#core/transcription.js').AudioRef} audio
+ * @returns {Promise<{bytes: Buffer, mimeType: string, filename: string}>}
+ */
+export async function loadAudio (audio) {
+  if (!audio?.fileUri || !audio?.mimeType) {
+    throw typedError('transcription requires audio {fileUri, mimeType}', 'SESSION_INVALID_AUDIO', false)
+  }
+  const { fileUri, mimeType } = audio
+  if (fileUri.startsWith('file://')) {
+    const path = fileUri.replace(/^file:\/\//, '')
+    let bytes
+    try {
+      bytes = await readFile(path)
+    } catch (e) {
+      throw typedError('audio file unreadable', 'SESSION_INVALID_AUDIO', false, messageOf(e))
+    }
+    return { bytes, mimeType, filename: basename(path) }
+  }
+  if (fileUri.startsWith('data:')) {
+    const parts = fileUri.split(',')
+    if (parts.length < 2) {
+      throw typedError('malformed audio data URI', 'SESSION_INVALID_AUDIO', false)
+    }
+    const ext = EXT_BY_MIME[mimeType] || mimeType.split('/').pop() || 'bin'
+    return { bytes: Buffer.from(parts[1], 'base64'), mimeType, filename: `audio.${ext}` }
+  }
+  throw typedError(
+    `unsupported audio URI scheme: ${fileUri.slice(0, 32)}…`,
+    'SESSION_INVALID_AUDIO',
+    false
+  )
+}
+function fromHttpStatus (status, message, detail) {
+  const typed = classifyProviderError({ status })
+  // Keep the classifier's message (stable/machine-readable); response
+  // body snippets go to `detail` only (F45).
+  return typedError(typed.message, typed.type, typed.retryable, detail ? `${message}: ${detail}` : message)
+}
+function typedError (message, type, retryable, detail) {
+  const err = new Error(message)
+  const typed = { message, severity: retryable ? 'warn' : 'error', retryable, type }
+  if (detail) typed.detail = detail
+  err.typed = typed
+  return err
+}
+/** @param {unknown} e */
+function messageOf (e) {
+  return e instanceof Error ? e.message : String(e)
+}

package/js/session/run_transcription.js ADDED Viewed

@@ -0,0 +1,64 @@
+/**
+ * Dispatch a TranscriptionEnvelope to the matching transcription
+ * adapter.
+ *
+ * Transcription is a single request/response — no streaming — so
+ * this returns a Promise of `TranscriptionResult` rather than an
+ * event generator. On adapter failure, the resolved error is a
+ * `TypedError` (structured, serializable) rather than a thrown JS
+ * `Error`.
+ *
+ * Like the image path, transcription skips rate-limit and cooldown —
+ * low-frequency one-shots that don't justify the per-call tracking
+ * overhead.
+ *
+ * @module session/run_transcription
+ */
+import { getTranscriptionAdapter } from './adapters/transcription/index.js'
+import { classifyProviderError } from './adapters/_errors.js'
+import { providerOf } from '#core/model-id.js'
+/**
+ * @param {import('#core/transcription.js').TranscriptionEnvelope} envelope
+ * @param {{
+ *   resolveAdapter?: (provider: string) => (
+ *     env: import('#core/transcription.js').TranscriptionEnvelope,
+ *     deps?: any
+ *   ) => Promise<import('#core/transcription.js').TranscriptionResult>,
+ *   spec?: any
+ * }} [options]
+ * @returns {Promise<
+ *   | {ok: true, result: import('#core/transcription.js').TranscriptionResult}
+ *   | {ok: false, error: import('#core/errors.js').TypedError}
+ * >}
+ */
+export async function runTranscription (envelope, { resolveAdapter = getTranscriptionAdapter, spec } = {}) {
+  let adapter
+  try {
+    adapter = resolveAdapter(providerOf(envelope.model))
+  } catch (e) {
+    return {
+      ok: false,
+      error: {
+        message: messageOf(e),
+        severity: 'error',
+        retryable: false,
+        type: 'SESSION_UNKNOWN_PROVIDER'
+      }
+    }
+  }
+  try {
+    const result = await adapter(envelope, spec ? { spec } : {})
+    return { ok: true, result }
+  } catch (e) {
+    const typed = /** @type {any} */(e).typed || classifyProviderError(e, envelope.auth?.key)
+    return { ok: false, error: typed }
+  }
+}
+/** @param {unknown} e */
+function messageOf (e) {
+  return e instanceof Error ? e.message : String(e)
+}

package/package.json CHANGED Viewed

@@ -1,12 +1,12 @@
 {
   "name": "mohdel",
-  "version": "0.110.0",
+  "version": "0.111.0",
   "license": "MIT",
   "author": {
     "name": "Christophe Le Bars",
     "email": "clb@toort.net"
   },
-  "description": "Self-hosted LLM gateway with an embeddable SDK. Process-isolated, OpenTelemetry-native inference across 11 providers — streaming, tools, thinking control — without orchestration. Use the Node factory in-process, or run thin-gate for fault isolation and any-language HTTP callers.",
+  "description": "Self-hosted LLM gateway and SDK for Node — a LiteLLM-style unified API for 11 providers (Anthropic, OpenAI, Gemini, Mistral, Groq, xAI, DeepSeek, OpenRouter, …) with per-call USD cost tracking, streaming, tool calls, vision, speech-to-text, and built-in OpenTelemetry. Run in-process, or behind the process-isolated thin-gate for fault containment.",
   "type": "module",
   "repository": {
     "type": "git",
@@ -87,10 +87,10 @@
     "@opentelemetry/exporter-trace-otlp-grpc": "^0.218.0",
     "@opentelemetry/sdk-node": "^0.218.0",
     "chalk": "^5.4.0",
-    "mohdel-thin-gate-linux-x64-gnu": "0.110.0"
+    "mohdel-thin-gate-linux-x64-gnu": "0.111.0"
   },
   "dependencies": {
-    "@anthropic-ai/sdk": "^0.102.0",
+    "@anthropic-ai/sdk": "^0.104.1",
     "@cerebras/cerebras_cloud_sdk": "^1.61.1",
     "@google/genai": "^2.8.0",
     "@opentelemetry/api": "^1.9.1",

package/src/cli/ask.js CHANGED Viewed

@@ -5,8 +5,8 @@ const noop = () => {}
 // Friendly next-step hints for common ask-time failures. Pure pattern match on
 // err.message — keeps the lib layer neutral, but gives CLI users a copy-pasteable
-// command instead of just an error.
-const hintsForError = (err, modelId) => {
+// command instead of just an error. Shared with `mo transcribe`.
+export const hintsForError = (err, modelId) => {
   const msg = String(err?.message || '')
   const detail = String(err?.detail || '')
   const both = `${msg}\n${detail}`

package/src/cli/index.js CHANGED Viewed

@@ -55,6 +55,7 @@ Commands:
   ratelimit provider rm <p>               Remove provider-level limits
   ask <provider/model> [prompt]           One-shot inference (pipeable)
+  transcribe <provider/model> <file>      Speech → text from an audio file
   default                                 Set default model (interactive)
   doctor                                  Check that your install is wired up
@@ -133,6 +134,9 @@ if (resolved === 'default') {
 } else if (resolved === 'ask') {
   const { runAsk } = await import('./ask.js')
   await runAsk(resolvedArgs)
+} else if (resolved === 'transcribe') {
+  const { runTranscribe } = await import('./transcribe.js')
+  await runTranscribe(resolvedArgs)
 } else if (resolved === 'model') {
   const { runModel } = await import('./model.js')
   await runModel(resolvedArgs)

package/src/cli/model.js CHANGED Viewed

@@ -269,7 +269,7 @@ Examples:
 Required fields (asked if not pre-filled):
   model        the literal id sent to the provider's API
   creator      who trained the model (e.g. anthropic, openai, alibaba)
-  inputFormat  subset of [text, image, video]
+  inputFormat  subset of [text, image, video, audio]
 See docs/CATALOG.md for the full field reference, and
 config/curated.example.json for ready-to-copy entries.`)

package/src/cli/transcribe.js ADDED Viewed

@@ -0,0 +1,145 @@
+import { resolve, extname } from 'node:path'
+import mohdel, { silent } from '../lib/index.js'
+import { loadDefaultEnv } from '../lib/common.js'
+import { hintsForError } from './ask.js'
+const noop = () => {}
+const MIME_BY_EXT = {
+  '.mp3': 'audio/mpeg',
+  '.mpga': 'audio/mpeg',
+  '.m4a': 'audio/mp4',
+  '.mp4': 'audio/mp4',
+  '.wav': 'audio/wav',
+  '.webm': 'audio/webm',
+  '.flac': 'audio/flac',
+  '.ogg': 'audio/ogg',
+  '.opus': 'audio/opus'
+}
+export async function runTranscribe (args) {
+  if (args.includes('-h') || args.includes('--help')) {
+    console.log(`mohdel transcribe — speech → text, pipeable
+Usage:
+  mo transcribe <model> <audio-file>
+Options:
+  --language <iso>     ISO-639-1 language hint (e.g. en, fr)
+  --prompt <text>      Spelling/context hint forwarded to the provider
+  --mime <type>        Override the MIME type guessed from the extension
+  --json               Output full result as JSON
+  -v, --verbose        Show debug info on stderr
+Output:
+  stdout: transcript text (raw — or JSON with --json)
+  stderr: model name + duration/cost summary
+Known extensions: ${Object.keys(MIME_BY_EXT).join(' ')}
+Examples:
+  mo transcribe groq/whisper-large-v3-turbo meeting.mp3
+  mo transcribe mistral/voxtral-mini-transcribe interview.wav --language fr
+  mo transcribe groq/whisper-large-v3-turbo memo.m4a --json | jq .cost`)
+    process.exit(0)
+  }
+  loadDefaultEnv()
+  const flagVal = (name) => {
+    const idx = args.indexOf(name)
+    if (idx === -1) return undefined
+    const val = args[idx + 1]
+    args.splice(idx, 2)
+    return val
+  }
+  const flag = (name) => {
+    const idx = args.indexOf(name)
+    if (idx === -1) return false
+    args.splice(idx, 1)
+    return true
+  }
+  const json = flag('--json')
+  const verbose = flag('--verbose') || flag('-v')
+  const language = flagVal('--language')
+  const prompt = flagVal('--prompt')
+  const mimeOverride = flagVal('--mime')
+  const [modelId, file] = args
+  if (!modelId || !file) {
+    console.error('Usage: mo transcribe <model> <audio-file>')
+    process.exit(1)
+  }
+  const mimeType = mimeOverride || MIME_BY_EXT[extname(file).toLowerCase()]
+  if (!mimeType) {
+    console.error(`Unknown audio extension '${extname(file)}'. Pass --mime <type> (e.g. --mime audio/mpeg).`)
+    process.exit(1)
+  }
+  const log = verbose ? (...args) => process.stderr.write(`${args.map(a => typeof a === 'string' ? a : JSON.stringify(a)).join(' ')}\n`) : noop
+  const logger = {
+    ...silent,
+    debug: verbose ? log : noop,
+    info: log,
+    warn: log,
+    error: log,
+    fatal: log
+  }
+  const mo = await mohdel({ logger })
+  let model
+  try {
+    model = mo.use(modelId)
+  } catch (err) {
+    console.error(err.message)
+    for (const h of hintsForError(err, modelId)) console.error(h)
+    process.exit(1)
+  }
+  const options = {}
+  if (language) options.language = language
+  if (prompt) options.prompt = prompt
+  process.stderr.write(`${model.id}\n`)
+  try {
+    const result = await model.transcribe(
+      { fileUri: `file://${resolve(file)}`, mimeType },
+      options
+    )
+    if (json) {
+      console.log(JSON.stringify({
+        model: model.id,
+        text: result.text,
+        language: result.language,
+        durationSeconds: result.durationSeconds,
+        inputTokens: result.inputTokens,
+        outputTokens: result.outputTokens,
+        cost: result.cost ?? null,
+        status: result.status
+      }, null, 2))
+    } else {
+      process.stdout.write(result.text)
+      if (result.text && !result.text.endsWith('\n')) process.stdout.write('\n')
+    }
+    const summary = []
+    if (result.durationSeconds != null) summary.push(`${result.durationSeconds}s audio`)
+    if (result.inputTokens) summary.push(`${result.inputTokens} in`)
+    if (result.outputTokens) summary.push(`${result.outputTokens} out`)
+    if (result.cost != null) summary.push(`$${result.cost.toFixed(4)}`)
+    const ts = result.timestamps
+    if (ts?.start && ts?.end) {
+      summary.push(`${Math.round(Number(BigInt(ts.end) - BigInt(ts.start)) / 1e6)}ms total`)
+    }
+    if (summary.length) process.stderr.write(`${summary.join(', ')}\n`)
+  } catch (err) {
+    console.error(`Error: ${err.detail || err.message}`)
+    for (const h of hintsForError(err, modelId)) console.error(h)
+    process.exit(1)
+  }
+}

package/src/lib/index.js CHANGED Viewed

@@ -14,7 +14,7 @@ import {
 } from './curated-cache.js'
 import { createRateLimiter } from '../../js/session/_rate_limiter.js'
 import { createCooldownTracker } from '../../js/session/_cooldown.js'
-import { runAnswer, runAnswerImage } from '../../js/factory/bridge.js'
+import { runAnswer, runAnswerImage, runAnswerTranscription } from '../../js/factory/bridge.js'
 import { startSpan, endSpanOk, endSpanError } from './tracing.js'
 import { isValidTag } from './schema.js'
 import { silent } from './logger.js'
@@ -644,6 +644,20 @@ const createModelProxy = (resolvedModelId, modelSpec, handlers, aliasOutputEffor
         }
       }
+      if (prop === 'transcribe') {
+        return async (audio, options = {}) => {
+          const { configuration } = await getRuntime()
+          return runAnswerTranscription({
+            provider: modelSpec.provider,
+            model: modelSpec.model ?? resolvedModelId.split('/').pop(),
+            configuration,
+            audio,
+            options,
+            spec: modelSpec
+          })
+        }
+      }
       if (prop === 'setRateLimit') {
         return async ({ rpm, tpm } = {}) => {
           const curatedCache = getCuratedCacheSnapshot()

package/src/lib/schema.js CHANGED Viewed

@@ -27,6 +27,7 @@ const fieldDefs = {
   imagePrice: { type: 'number' },
   imageEndpoint: { type: 'string' },
   imageDefaultSize: { type: 'string' },
+  transcriptionPrice: { type: 'number' },
   deprecated: { type: 'string' },
   suspended: { type: 'string' },
   rpmLimit: { type: 'number' },