npm - @drakulavich/parakeet-cli - Versions diffs - 0.5.3 → 0.6.1 - Mend

@drakulavich/parakeet-cli 0.5.3 → 0.6.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (4) hide show

package/README.md +51 -122
package/package.json +3 -2
package/src/models.ts +30 -3
package/src/__tests__/audio.test.ts +0 -28

package/README.md CHANGED Viewed

@@ -1,171 +1,100 @@
-# parakeet-cli
+# 🦜 parakeet-cli
-[![npm version](https://img.shields.io/npm/v/@drakulavich/parakeet-cli)](https://www.npmjs.com/package/@drakulavich/parakeet-cli)
 [![CI](https://github.com/drakulavich/parakeet-cli/actions/workflows/ci.yml/badge.svg)](https://github.com/drakulavich/parakeet-cli/actions/workflows/ci.yml)
+[![npm version](https://img.shields.io/npm/v/@drakulavich/parakeet-cli)](https://www.npmjs.com/package/@drakulavich/parakeet-cli)
 [![License: MIT](https://img.shields.io/badge/License-MIT-blue.svg)](https://opensource.org/licenses/MIT)
 [![Bun](https://img.shields.io/badge/runtime-Bun-f9f1e1?logo=bun)](https://bun.sh)
-Fast multilingual speech-to-text CLI powered by NVIDIA Parakeet models. Zero Python. CoreML on Apple Silicon, ONNX on CPU.
-## Features
+Fast local speech-to-text. 25 languages. ~18x faster than Whisper on Apple Silicon.
-- **25 languages** — automatic language detection, no prompting needed
-- **~155x real-time on Apple Silicon** — CoreML backend via [FluidAudio](https://github.com/FluidInference/FluidAudio) (1 min audio in ~0.4s)
-- **3x faster than Whisper** on CPU with ONNX fallback (see [benchmark](#benchmark))
-- **Zero Python** — pure TypeScript/Bun, native Swift binary for CoreML
-- **Smart install** — `parakeet install` auto-detects platform: CoreML on macOS arm64, ONNX elsewhere
-- **Any audio format** — ffmpeg handles OGG, MP3, WAV, FLAC, M4A, etc.
+- **CoreML on Apple Silicon** — ~155x real-time via [FluidAudio](https://github.com/FluidInference/FluidAudio)
+- **ONNX on CPU** — cross-platform fallback, 3x faster than Whisper
+- **Any audio format** — ffmpeg handles OGG, MP3, WAV, FLAC, M4A
+- **Zero Python** — Bun + TypeScript, native Swift binary for CoreML
-## Install
-Using Bun (recommended):
+## Quick Start
 ```bash
 bun install -g @drakulavich/parakeet-cli
+parakeet install          # CoreML on macOS arm64, ONNX elsewhere
+parakeet audio.ogg        # → transcript to stdout
 ```
-Using npm (requires Bun runtime installed):
-```bash
-npm install -g @drakulavich/parakeet-cli
-```
-Or clone and link locally:
+## Usage
 ```bash
-git clone https://github.com/drakulavich/parakeet-cli.git
-cd parakeet-cli
-bun install
-bun link
+parakeet install                 # auto-detect backend
+parakeet install --coreml        # force CoreML (macOS arm64)
+parakeet install --onnx          # force ONNX (~3GB)
+parakeet audio.ogg               # transcribe (language auto-detected)
+parakeet --version
 ```
-> **Note:** Bun is required as the runtime — the CLI uses Bun-native APIs and TypeScript execution. You can use either `bun` or `npm` as the package manager to install it, but Bun must be available in PATH to run the `parakeet` command.
+Stdout: transcript. Stderr: errors. Pipe-friendly.
-## Usage
+## Requirements
-```bash
-# Download backend (required before first use)
-# On macOS Apple Silicon: downloads CoreML binary
-# On Linux/other: downloads ONNX models (~3GB)
-parakeet install
+- [Bun](https://bun.sh) >= 1.3
+- [ffmpeg](https://ffmpeg.org) in PATH (ONNX backend only)
+- ~3GB disk (ONNX models)
-# Force a specific backend
-parakeet install --coreml    # CoreML (macOS arm64 only)
-parakeet install --onnx      # ONNX (any platform)
+## Benchmark
-# Transcribe any audio file (language auto-detected)
-parakeet audio.ogg
+> **~18x faster than Whisper** on Apple Silicon (CoreML)
-# Force re-download
-parakeet install --no-cache
+<details>
+<summary>MacBook Pro M3 Pro — 10 Russian voice messages</summary>
-# Show version
-parakeet --version
+```
+faster-whisper (CPU):  35.3s  ██████████████████████████████████████
+Parakeet (CoreML):      1.9s  ██
 ```
-Output goes to stdout, errors to stderr. Designed for piping and scripting.
-## Benchmark
-10 Telegram voice messages (Russian, 3-10s each) on MacBook Pro M3 Pro:
-| | faster-whisper (CPU) | Parakeet (CoreML) |
-|---|---|---|
-| **Total time** | 35.3s | 1.9s |
-| **Speedup** | | **~18x faster** |
+| | faster-whisper | Parakeet | Speedup |
+|---|---|---|---|
+| Apple Silicon (CoreML) | 35.3s | **1.9s** | **~18x** |
+| Linux CI (ONNX) | 79.2s | **45.4s** | **~1.7x** |
-Models: faster-whisper medium (int8) vs Parakeet TDT 0.6B v3 (CoreML, Apple Neural Engine).
+</details>
-See [BENCHMARK.md](BENCHMARK.md) for full results with transcripts. Updated automatically on each release.
+Full results with transcripts: [BENCHMARK.md](BENCHMARK.md)
 ## Supported Languages
-Bulgarian, Croatian, Czech, Danish, Dutch, English, Estonian, Finnish, French, German, Greek, Hungarian, Italian, Latvian, Lithuanian, Maltese, Polish, Portuguese, Romanian, Russian, Slovak, Slovenian, Spanish, Swedish, Ukrainian.
+:bulgaria: Bulgarian, :croatia: Croatian, :czech_republic: Czech, :denmark: Danish, :netherlands: Dutch, :gb: English, :estonia: Estonian, :finland: Finnish, :fr: French, :de: German, :greece: Greek, :hungary: Hungarian, :it: Italian, :latvia: Latvian, :lithuania: Lithuanian, :malta: Maltese, :poland: Polish, :portugal: Portuguese, :romania: Romanian, :ru: Russian, :slovakia: Slovak, :slovenia: Slovenian, :es: Spanish, :sweden: Swedish, :ukraine: Ukrainian
 ## How It Works
-### CoreML backend (macOS Apple Silicon)
-```
-parakeet audio.ogg
-  |
-  +-- parakeet-coreml (Swift binary via FluidAudio)
-  |   +-- CoreML inference on Apple Neural Engine
-  |   +-- ~155x real-time on M4 Pro
-  |
-  stdout: transcript
-```
-Uses [FluidAudio](https://github.com/FluidInference/FluidAudio) with the [CoreML model](https://huggingface.co/FluidInference/parakeet-tdt-0.6b-v3-coreml). CoreML model files are downloaded by FluidAudio on first transcription.
-### ONNX backend (cross-platform fallback)
 ```
 parakeet audio.ogg
-  |
-  +-- ffmpeg: any format -> 16kHz mono float32
-  +-- nemo128.onnx: waveform -> 128-dim log-mel spectrogram
-  +-- per-utterance normalization (mean=0, std=1)
-  +-- encoder-model.onnx: mel features -> encoder output
-  +-- TDT greedy decoder: encoder output -> token IDs + durations
-  +-- vocab.txt: token IDs -> text
-  |
-  stdout: transcript
-```
-Uses [NVIDIA Parakeet TDT 0.6B v3](https://huggingface.co/nvidia/parakeet-tdt-0.6b-v3) exported to ONNX by [istupakov](https://huggingface.co/istupakov/parakeet-tdt-0.6b-v3-onnx). Run `parakeet install --onnx` to download models from HuggingFace (~3GB).
-## Requirements
-- [Bun](https://bun.sh) >= 1.3 (runtime)
-- [ffmpeg](https://ffmpeg.org) installed and in PATH
-- ~3GB disk space for model cache
-- npm or Bun can be used as the package manager
-### macOS (Apple Silicon)
-Works natively on M1/M2/M3/M4 with CoreML acceleration. Install dependencies with Homebrew:
-```bash
-brew install ffmpeg
-curl -fsSL https://bun.sh/install | bash
-bun install -g @drakulavich/parakeet-cli    # or: npm install -g @drakulavich/parakeet-cli
-parakeet install                             # downloads CoreML binary
+  ├── CoreML installed? → parakeet-coreml subprocess → stdout
+  └── ONNX installed?   → ffmpeg → mel → encoder → decoder → stdout
 ```
-### Linux
-```bash
-apt install ffmpeg   # or yum, pacman, etc.
-curl -fsSL https://bun.sh/install | bash
-bun install -g @drakulavich/parakeet-cli    # or: npm install -g @drakulavich/parakeet-cli
-parakeet install                             # downloads ONNX models (~3GB)
-```
+- **CoreML**: Swift binary wraps [FluidAudio](https://github.com/FluidInference/FluidAudio) + [CoreML model](https://huggingface.co/FluidInference/parakeet-tdt-0.6b-v3-coreml)
+- **ONNX**: [NVIDIA Parakeet TDT 0.6B v3](https://huggingface.co/nvidia/parakeet-tdt-0.6b-v3) via [onnxruntime-node](https://www.npmjs.com/package/onnxruntime-node)
 ## OpenClaw Integration
-To use parakeet as the voice transcription engine in [OpenClaw](https://docs.openclaw.ai), update `~/.openclaw/openclaw.json`:
+Drop-in replacement for OpenClaw voice processing — no API keys, runs locally.
 ```json
-"tools": {
-  "media": {
-    "audio": {
-      "enabled": true,
-      "models": [
-        {
-          "type": "cli",
-          "command": "parakeet",
-          "args": ["{{MediaPath}}"],
-          "timeoutSeconds": 120
-        }
-      ],
-      "echoTranscript": false
+{
+  "tools": {
+    "media": {
+      "audio": {
+        "enabled": true,
+        "models": [{"type": "cli", "command": "parakeet", "args": ["{{MediaPath}}"], "timeoutSeconds": 120}],
+        "echoTranscript": false
+      }
     }
   }
 }
 ```
-Then restart the gateway: `openclaw gateway restart`
+## Contributing
+See [CONTRIBUTING.md](CONTRIBUTING.md).
 ## License

package/package.json CHANGED Viewed

@@ -1,7 +1,7 @@
 {
   "name": "@drakulavich/parakeet-cli",
-  "version": "0.5.3",
-  "description": "Fast multilingual speech-to-text CLI powered by NVIDIA Parakeet ONNX models",
+  "version": "0.6.1",
+  "description": "Fast local speech-to-text CLI. CoreML on Apple Silicon, ONNX on CPU. 25 languages.",
   "type": "module",
   "bin": {
     "parakeet": "bin/parakeet.js"
@@ -49,6 +49,7 @@
   },
   "devDependencies": {
     "@types/bun": "latest",
+    "fast-xml-parser": "^5.5.10",
     "typescript": "^6.0.2"
   },
   "dependencies": {

package/src/models.ts CHANGED Viewed

@@ -1,6 +1,7 @@
 import { join, dirname } from "path";
 import { homedir } from "os";
 import { existsSync, mkdirSync, chmodSync } from "fs";
+import { isCoreMLInstalled } from "./coreml";
 export const HF_REPO = "istupakov/parakeet-tdt-0.6b-v3-onnx";
@@ -21,6 +22,10 @@ export function isModelCached(dir?: string): boolean {
   return MODEL_FILES.every((f) => existsSync(join(d, f)));
 }
+export function isModelInstalled(modelDir?: string): boolean {
+  return isCoreMLInstalled() || isModelCached(modelDir);
+}
 export function installHintError(headline: string): Error {
   const lines = [
     headline,
@@ -62,13 +67,35 @@ export async function downloadModel(noCache = false, modelDir?: string): Promise
     console.error(`Downloading ${file}...`);
-    const res = await fetch(url, { redirect: "follow" });
+    let res: Response;
+    try {
+      res = await fetch(url, { redirect: "follow" });
+    } catch (e) {
+      throw new Error(`failed to fetch ${file}: ${e instanceof Error ? e.message : e}`);
+    }
     if (!res.ok) {
-      throw new Error(`failed to download model: ${url} (${res.status})`);
+      throw new Error(`failed to download ${file}: HTTP ${res.status}`);
+    }
+    if (!res.body) {
+      throw new Error(`empty response body for ${file}`);
+    }
+    const writer = Bun.file(dest).writer();
+    let bytes = 0;
+    try {
+      for await (const chunk of res.body) {
+        writer.write(chunk);
+        bytes += chunk.length;
+      }
+    } finally {
+      writer.end();
     }
-    await Bun.write(dest, res);
+    if (bytes === 0) {
+      throw new Error(`downloaded 0 bytes for ${file}`);
+    }
   }
   console.error("Model downloaded successfully.");

package/src/__tests__/audio.test.ts DELETED Viewed

@@ -1,28 +0,0 @@
-import { describe, test, expect } from "bun:test";
-import { convertToFloat32PCM } from "../audio";
-import { spawnSync } from "child_process";
-const hasFfmpeg = spawnSync("which", ["ffmpeg"]).status === 0;
-describe.skipIf(!hasFfmpeg)("audio", () => {
-  test("converts WAV to 16kHz mono Float32Array", async () => {
-    const buffer = await convertToFloat32PCM("fixtures/silence.wav");
-    expect(buffer).toBeInstanceOf(Float32Array);
-    // 1 second at 16kHz = 16000 samples
-    expect(buffer.length).toBeGreaterThan(15000);
-    expect(buffer.length).toBeLessThan(17000);
-  });
-  test("throws on missing file", async () => {
-    expect(convertToFloat32PCM("nonexistent.wav")).rejects.toThrow(
-      "file not found"
-    );
-  });
-  test("throws on corrupt file", async () => {
-    await Bun.write("fixtures/corrupt.bin", "not audio data");
-    expect(convertToFloat32PCM("fixtures/corrupt.bin")).rejects.toThrow(
-      "failed to convert audio"
-    );
-  });
-});