npm - handy-remote-server - Versions diffs - 1.1.0 → 1.3.0 - Mend

handy-remote-server 1.1.0 → 1.3.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (4) hide show

package/README.md CHANGED Viewed

@@ -1,6 +1,6 @@
 # Handy Remote Server 🎙️
-A lightweight standalone inference server for [Handy](https://github.com/cjpais/Handy), allowing you to transcribe audio from external devices, weak computers, and more.
+A lightweight standalone inference server for [Handy](https://github.com/viktor-silakov/Handy), allowing you to transcribe audio from external devices, weak computers, and more.
 ## Installation
@@ -10,30 +10,107 @@ The easiest way to run the external inference server is using `npx`:
 npx handy-remote-server
 ```
-_(You must have Node.js and npm installed)_
+_(You must have Node.js and Rust/Cargo installed)_
 ## Usage
-When you run the server for the first time, it will automatically download the **GigaAM v3** model (Russian-only fast architecture model) if it's not present.
+When you run the server for the first time, it will:
-It will also generate a unique Bearer API Token for your active session:
+1. **Download the GigaAM v3 model** (~100 MB) with a progress bar:
-```bash
-Your API KEY is: xxxxx-xxxxx-xxxxx-xxxxx
+```
+📥 Downloading model...
+   URL:  https://blob.handy.computer/giga-am-v3.int8.onnx
+   Dest: /path/to/models/gigaam.onnx
+   ████████████████████░░░░░░░░░░░░░░░░░░░░  52.3%  52.10 MB / 99.60 MB  12.5 MB/s
+✅ Download complete in 8.2s
+```
+2. **Generate a persistent API key** saved to `~/.handy/api_key`:
+```
+======================================================
+Generated a new API KEY (saved to /Users/you/.handy/api_key)
+Your API KEY is: xxxxx...xxxxx
+======================================================
+```
+The key persists across restarts. On the next launch, it will be loaded automatically.
+3. **Start the server** and log every request in detail:
+```
 Handy Remote Server is running on port 3000
+[2026-03-07T12:00:00.000Z] ── REQUEST #1 ──────────────────────
+  Method:  POST /transcribe
+  From:    192.168.1.5
+  Auth:    OK
+  [#1]  Audio received: 156.3 KB
+  [#1]  Queued for inference (queue length: 0)
+[2026-03-07T12:00:01.234Z] ── RESPONSE #1 ─────────────────────
+  Status:   200
+  Duration: 1.23s
+  Result:   "Привет, как дела?"
 ```
+### Connecting from Handy
 1. Open **Handy** on your client machine.
-2. Go to **Settings > General**, select the `Remote` engine.
-3. Provide the Server URL: `http://<your-server-ip>:3000`
-4. Provide the generated Token.
-5. All audio chunks will now be transcribed by the server!
+2. Go to **Settings > Models**, select **Remote Server**.
+3. Go to **Settings > General**, fill in:
+   - **Remote Server URL**: `http://<your-server-ip>:3000`
+   - **API Token**: the generated token
+4. All transcriptions will now be processed by the server!
-## How it works
+## Environment Variables
-The `handy-remote-server` spins up a tiny Express server alongside a heavily optimized Rust CLI (`rust-infer`) powered by `transcribe-rs`. Audio files are dispatched sequentially from the Node server directly into the Rust engine.
+| Variable         | Default                                     | Description                            |
+| ---------------- | ------------------------------------------- | -------------------------------------- |
+| `PORT`           | `3000`                                      | Server port                            |
+| `API_KEY`        | auto-generated, saved to `~/.handy/api_key` | Bearer token for authentication        |
+| `INFER_CLI_PATH` | auto-detected                               | Path to the `rust-infer` binary        |
+| `MODEL_TYPE`     | `gigaam`                                    | Transcription model to use (see below) |
+## Supported Models
+```bash
+# Русский (по умолчанию)
+MODEL_TYPE=gigaam npx handy-remote-server
+# Мультиязычный (включая русский) — Whisper модели
+MODEL_TYPE=whisper-tiny npx handy-remote-server      # 75 MB
+MODEL_TYPE=whisper-base npx handy-remote-server      # 142 MB
+MODEL_TYPE=whisper-small npx handy-remote-server     # 487 MB
+MODEL_TYPE=whisper-medium npx handy-remote-server    # 1.5 GB
+# Английский — Moonshine
+MODEL_TYPE=moonshine-tiny npx handy-remote-server    # 60 MB
+MODEL_TYPE=moonshine-base npx handy-remote-server    # 100 MB
-### Environment variables
+# Английский — Breeze/Parakeet
+MODEL_TYPE=parakeet npx handy-remote-server          # ~200 MB
+# Мультиязычный — SenseVoice
+MODEL_TYPE=sensevoice npx handy-remote-server        # ~200 MB
+```
+| Model              | Language       | Size    | Speed     |
+| ------------------ | -------------- | ------- | --------- |
+| `gigaam` (default) | Russian        | ~100 MB | ⚡ Fast   |
+| `whisper-tiny`     | Multi-language | 75 MB   | ⚡ Fast   |
+| `whisper-base`     | Multi-language | 142 MB  | ⚡ Fast   |
+| `whisper-small`    | Multi-language | 487 MB  | 🔄 Medium |
+| `whisper-medium`   | Multi-language | 1.5 GB  | 🐢 Slow   |
+| `moonshine-tiny`   | English        | 60 MB   | ⚡ Fast   |
+| `moonshine-base`   | English        | 100 MB  | ⚡ Fast   |
+| `parakeet`         | English        | ~200 MB | 🔄 Medium |
+| `sensevoice`       | Multi-language | ~200 MB | 🔄 Medium |
+## How It Works
+The `handy-remote-server` spins up a tiny Express server alongside a heavily optimized Rust CLI (`rust-infer`) powered by `transcribe-rs`. Audio files are dispatched sequentially from the Node server directly into the Rust engine.
-- `PORT` - defaults to `3000`
-- `API_KEY` - defaults to an auto-generated token in development. Set this to a permanent token for production.
+Currently the server uses the **GigaAM v3** model (Russian-language, fast inference, ~100 MB).

package/dist/index.js CHANGED Viewed

@@ -5,13 +5,14 @@ var __importDefault = (this && this.__importDefault) || function (mod) {
 };
 Object.defineProperty(exports, "__esModule", { value: true });
 const express_1 = __importDefault(require("express"));
-const multer_1 = __importDefault(require("multer"));
 const child_process_1 = require("child_process");
 const crypto_1 = __importDefault(require("crypto"));
 const path_1 = __importDefault(require("path"));
 const fs_1 = __importDefault(require("fs"));
 const os_1 = __importDefault(require("os"));
 const dotenv_1 = __importDefault(require("dotenv"));
+const tar_fs_1 = __importDefault(require("tar-fs"));
+const gunzip_maybe_1 = __importDefault(require("gunzip-maybe"));
 dotenv_1.default.config();
 const app = (0, express_1.default)();
 const port = process.env.PORT || 3000;
@@ -19,36 +20,21 @@ const port = process.env.PORT || 3000;
 const handyDir = path_1.default.join(os_1.default.homedir(), '.handy');
 const keyFilePath = path_1.default.join(handyDir, 'api_key');
 function loadOrCreateApiKey() {
-    // 1. Env var takes priority
-    if (process.env.API_KEY) {
+    if (process.env.API_KEY)
         return process.env.API_KEY;
-    }
-    // 2. Try to load from cached file
     if (fs_1.default.existsSync(keyFilePath)) {
         const cached = fs_1.default.readFileSync(keyFilePath, 'utf-8').trim();
-        if (cached.length > 0) {
-            console.log(`\n======================================================`);
-            console.log(`Loaded API KEY from ${keyFilePath}`);
-            console.log(`Your API KEY is: ${cached}`);
-            console.log(`======================================================\n`);
+        if (cached.length > 0)
             return cached;
-        }
     }
-    // 3. Generate a new one and persist it
     const newKey = crypto_1.default.randomBytes(32).toString('hex');
     fs_1.default.mkdirSync(handyDir, { recursive: true });
     fs_1.default.writeFileSync(keyFilePath, newKey + '\n', { mode: 0o600 });
-    console.log(`\n======================================================`);
-    console.log(`Generated a new API KEY (saved to ${keyFilePath})`);
-    console.log(`Your API KEY is: ${newKey}`);
-    console.log(`======================================================\n`);
     return newKey;
 }
 const API_KEY = loadOrCreateApiKey();
 // ── Logging helpers ───────────────────────────────────────────────────
-function timestamp() {
-    return new Date().toISOString();
-}
+function timestamp() { return new Date().toISOString(); }
 function formatBytes(bytes) {
     if (bytes < 1024)
         return `${bytes} B`;
@@ -61,16 +47,129 @@ function formatDuration(ms) {
         return `${ms}ms`;
     return `${(ms / 1000).toFixed(2)}s`;
 }
-// ── Ensure directories ────────────────────────────────────────────────
-const modelsDir = path_1.default.join(__dirname, '..', 'models');
-if (!fs_1.default.existsSync(modelsDir)) {
-    fs_1.default.mkdirSync(modelsDir, { recursive: true });
+const MODEL_REGISTRY = {
+    'gigaam': {
+        engine: 'gigaam',
+        url: 'https://blob.handy.computer/giga-am-v3.int8.onnx',
+        filename: 'gigaam.onnx'
+    },
+    'whisper-tiny': {
+        engine: 'whisper',
+        url: 'https://blob.handy.computer/ggml-tiny.bin',
+        filename: 'whisper-tiny.bin'
+    },
+    'whisper-base': {
+        engine: 'whisper',
+        url: 'https://blob.handy.computer/ggml-base.bin',
+        filename: 'whisper-base.bin'
+    },
+    'whisper-small': {
+        engine: 'whisper',
+        url: 'https://blob.handy.computer/ggml-small.bin',
+        filename: 'whisper-small.bin'
+    },
+    'whisper-medium': {
+        engine: 'whisper',
+        url: 'https://blob.handy.computer/whisper-medium-q4_1.bin',
+        filename: 'whisper-medium.bin'
+    },
+    'moonshine-tiny': {
+        engine: 'moonshine',
+        url: 'https://blob.handy.computer/moonshine-tiny-streaming-en.tar.gz',
+        filename: 'moonshine-tiny', // Dir name after extraction
+        isArchive: true
+    },
+    'moonshine-base': {
+        engine: 'moonshine',
+        url: 'https://blob.handy.computer/moonshine-base.tar.gz',
+        filename: 'moonshine-base',
+        isArchive: true
+    },
+    'parakeet': {
+        engine: 'parakeet',
+        url: 'https://blob.handy.computer/parakeet-v3-int8.tar.gz',
+        filename: 'parakeet-v3',
+        isArchive: true,
+        configFilename: 'preprocessor.json'
+    },
+    'sensevoice': {
+        engine: 'sensevoice',
+        url: 'https://blob.handy.computer/sense-voice-int8.tar.gz',
+        filename: 'sensevoice',
+        isArchive: true
+    }
+};
+const SELECTED_MODEL_TYPE = (process.env.MODEL_TYPE || 'gigaam').toLowerCase();
+const modelCfg = MODEL_REGISTRY[SELECTED_MODEL_TYPE];
+if (!modelCfg) {
+    console.error(`Error: Unknown MODEL_TYPE "${SELECTED_MODEL_TYPE}".`);
+    console.error(`Supported types: ${Object.keys(MODEL_REGISTRY).join(', ')}`);
+    process.exit(1);
 }
+// ── Directories ───────────────────────────────────────────────────────
+const modelsBaseDir = path_1.default.join(__dirname, '..', 'models');
 const uploadDir = path_1.default.join(__dirname, '..', 'uploads');
-if (!fs_1.default.existsSync(uploadDir)) {
-    fs_1.default.mkdirSync(uploadDir, { recursive: true });
+[modelsBaseDir, uploadDir].forEach(d => { if (!fs_1.default.existsSync(d))
+    fs_1.default.mkdirSync(d, { recursive: true }); });
+// ── Model paths ───────────────────────────────────────────────────────
+const modelPath = path_1.default.join(modelsBaseDir, modelCfg.filename);
+let actualModelFile = modelPath;
+let parakeetConfigPath = '';
+if (modelCfg.isArchive) {
+    // For archives, we look for model.onnx inside the directory
+    actualModelFile = path_1.default.join(modelPath, 'model.onnx');
+    if (modelCfg.engine === 'parakeet') {
+        parakeetConfigPath = path_1.default.join(modelPath, modelCfg.configFilename);
+    }
+}
+// ── Download & Extract ────────────────────────────────────────────────
+async function downloadAndPrepare() {
+    if (fs_1.default.existsSync(actualModelFile))
+        return;
+    const dest = modelCfg.isArchive ? modelPath + '.tar.gz' : modelPath;
+    console.log(`\n📥 Downloading model: ${SELECTED_MODEL_TYPE}...`);
+    console.log(`   URL:  ${modelCfg.url}`);
+    const response = await fetch(modelCfg.url);
+    if (!response.ok)
+        throw new Error(`Failed to fetch: ${response.statusText}`);
+    const totalBytes = parseInt(response.headers.get('content-length') || '0', 10);
+    let downloadedBytes = 0;
+    const startTime = Date.now();
+    const fileStream = fs_1.default.createWriteStream(dest);
+    const reader = response.body?.getReader();
+    if (!reader)
+        throw new Error('Body not readable');
+    const barWidth = 40;
+    while (true) {
+        const { done, value } = await reader.read();
+        if (done)
+            break;
+        fileStream.write(Buffer.from(value));
+        downloadedBytes += value.length;
+        const percent = totalBytes > 0 ? downloadedBytes / totalBytes : 0;
+        const filled = Math.round(barWidth * percent);
+        const bar = '█'.repeat(filled) + '░'.repeat(barWidth - filled);
+        const pct = (percent * 100).toFixed(1).padStart(5);
+        const speed = (downloadedBytes / ((Date.now() - startTime) / 1000) / 1024 / 1024).toFixed(1);
+        process.stdout.write(`\r   ${bar} ${pct}%  ${formatBytes(downloadedBytes)} / ${formatBytes(totalBytes)}  ${speed} MB/s   `);
+    }
+    await new Promise(r => fileStream.end(() => r()));
+    process.stdout.write('\n');
+    if (modelCfg.isArchive) {
+        console.log(`📦 Extracting archive to ${modelPath}...`);
+        fs_1.default.mkdirSync(modelPath, { recursive: true });
+        await new Promise((resolve, reject) => {
+            fs_1.default.createReadStream(dest)
+                .pipe((0, gunzip_maybe_1.default)())
+                .pipe(tar_fs_1.default.extract(modelPath))
+                .on('finish', resolve)
+                .on('error', reject);
+        });
+        fs_1.default.unlinkSync(dest); // Cleanup
+    }
+    console.log(`✅ Ready!\n`);
 }
-// ── Request logging middleware ────────────────────────────────────────
+// ── Request logging ───────────────────────────────────────────────────
 let requestCounter = 0;
 app.use((req, res, next) => {
     const reqId = ++requestCounter;
@@ -78,115 +177,61 @@ app.use((req, res, next) => {
     const ip = req.headers['x-forwarded-for'] || req.socket.remoteAddress || 'unknown';
     console.log(`\n[${timestamp()}] ── REQUEST #${reqId} ──────────────────────`);
     console.log(`  Method:  ${req.method} ${req.path}`);
-    console.log(`  From:    ${ip}`);
-    console.log(`  Headers: Content-Type=${req.headers['content-type'] || 'N/A'}, Content-Length=${req.headers['content-length'] || 'N/A'}`);
-    // Store metadata on request for later use
+    console.log(`  Model:   ${SELECTED_MODEL_TYPE}`);
     req._reqId = reqId;
     req._startTime = start;
-    req._ip = ip;
     const originalJson = res.json.bind(res);
     res.json = function (body) {
-        const duration = Date.now() - start;
-        const status = res.statusCode;
-        console.log(`[${timestamp()}] ── RESPONSE #${reqId} ─────────────────────`);
-        console.log(`  Status:   ${status}`);
-        console.log(`  Duration: ${formatDuration(duration)}`);
-        if (body?.text) {
-            const preview = body.text.length > 100 ? body.text.substring(0, 100) + '...' : body.text;
-            console.log(`  Result:   "${preview}"`);
-        }
-        else if (body?.error) {
+        console.log(`[${timestamp()}] ── RESPONSE #${reqId} (Status: ${res.statusCode}, ${Date.now() - start}ms) ─────`);
+        if (body?.text)
+            console.log(`  Result:   "${body.text.substring(0, 100)}${body.text.length > 100 ? '...' : ''}"`);
+        else if (body?.error)
             console.log(`  Error:    ${body.error}`);
-        }
-        console.log(`  ────────────────────────────────────────────────`);
         return originalJson(body);
     };
     next();
 });
-// ── Authentication middleware ─────────────────────────────────────────
+// ── Auth ──────────────────────────────────────────────────────────────
 app.use((req, res, next) => {
     const authHeader = req.headers.authorization;
-    if (!authHeader || !authHeader.startsWith('Bearer ')) {
-        console.log(`  Auth:     REJECTED (missing/invalid Authorization header)`);
-        return res.status(401).json({ error: 'Missing or invalid Authorization header' });
-    }
-    const token = authHeader.split(' ')[1];
-    if (token !== API_KEY) {
-        console.log(`  Auth:     REJECTED (invalid key)`);
-        return res.status(403).json({ error: 'Invalid API Key' });
+    if (!authHeader?.startsWith('Bearer ') || authHeader.split(' ')[1] !== API_KEY) {
+        return res.status(401).json({ error: 'Auth failed' });
     }
-    console.log(`  Auth:     OK`);
     next();
 });
-// ── Multer storage ────────────────────────────────────────────────────
-const storage = multer_1.default.diskStorage({
-    destination: function (req, file, cb) {
-        cb(null, uploadDir);
-    },
-    filename: function (req, file, cb) {
-        const uniqueSuffix = Date.now() + '-' + Math.round(Math.random() * 1E9);
-        cb(null, file.fieldname + '-' + uniqueSuffix + '.wav');
-    }
-});
-const upload = (0, multer_1.default)({ storage: storage });
-// ── Model download ───────────────────────────────────────────────────
-const GIGAAM_MODEL_URL = 'https://blob.handy.computer/giga-am-v3.int8.onnx';
-const gigaamModelPath = path_1.default.join(modelsDir, 'gigaam.onnx');
-async function downloadFile(url, dest) {
-    if (fs_1.default.existsSync(dest))
-        return;
-    console.log(`Downloading ${url} to ${dest}...`);
-    fs_1.default.mkdirSync(path_1.default.dirname(dest), { recursive: true });
-    const response = await fetch(url);
-    if (!response.ok)
-        throw new Error(`Failed to fetch ${url}: ${response.statusText}`);
-    const arrBuffer = await response.arrayBuffer();
-    fs_1.default.writeFileSync(dest, Buffer.from(arrBuffer));
-    console.log(`Downloaded ${dest}`);
-}
-async function ensureModels() {
-    await downloadFile(GIGAAM_MODEL_URL, gigaamModelPath);
-}
+// ── Inference Bridge ──────────────────────────────────────────────────
 let inferProcess = null;
 let isReady = false;
 let resolvers = {};
-ensureModels().then(() => {
-    let inferProcessPath = process.env.INFER_CLI_PATH || path_1.default.join(__dirname, '..', 'rust-infer', 'target', 'release', 'rust-infer');
-    if (!fs_1.default.existsSync(inferProcessPath)) {
-        inferProcessPath = path_1.default.join(__dirname, '..', 'rust-infer', 'target', 'debug', 'rust-infer');
-    }
-    console.log(`Using inference CLI: ${inferProcessPath}`);
-    inferProcess = (0, child_process_1.spawn)(inferProcessPath, [gigaamModelPath], { stdio: ['pipe', 'pipe', 'inherit'] });
+downloadAndPrepare().then(() => {
+    let binPath = process.env.INFER_CLI_PATH || path_1.default.join(__dirname, '..', 'rust-infer', 'target', 'release', 'rust-infer');
+    if (!fs_1.default.existsSync(binPath))
+        binPath = path_1.default.join(__dirname, '..', 'rust-infer', 'target', 'debug', 'rust-infer');
+    console.log(`Starting inference: ${binPath}`);
+    const args = [modelCfg.engine, actualModelFile];
+    if (parakeetConfigPath)
+        args.push(parakeetConfigPath);
+    inferProcess = (0, child_process_1.spawn)(binPath, args, { stdio: ['pipe', 'pipe', 'inherit'] });
     inferProcess.stdout.on('data', (data) => {
-        const lines = data.toString().split('\n').map((l) => l.trim()).filter(Boolean);
-        for (const line of lines) {
-            if (line === 'READY') {
+        data.toString().split('\n').filter(Boolean).forEach(line => {
+            if (line.trim() === 'READY') {
                 isReady = true;
-                console.log('Inference worker is ready.');
-                continue;
+                console.log('--- Model fully loaded and ready ---');
+                return;
             }
             try {
                 const parsed = JSON.parse(line);
-                const resolverCount = Object.keys(resolvers).length;
-                if (resolverCount > 0) {
-                    const firstKey = Object.keys(resolvers)[0];
+                const firstKey = Object.keys(resolvers)[0];
+                if (firstKey) {
                     resolvers[firstKey](parsed);
+                    delete resolvers[firstKey];
                 }
             }
-            catch (e) {
-                console.log('Got non-JSON output from worker:', line);
-            }
-        }
-    });
-    inferProcess.on('exit', (code) => {
-        console.log(`Inference worker exited with code ${code}`);
-        process.exit(code || 1);
+            catch { }
+        });
     });
-}).catch(e => {
-    console.error('Failed to download models:', e);
-    process.exit(1);
-});
-// ── Request queue ─────────────────────────────────────────────────────
+    inferProcess.on('exit', (code) => process.exit(code || 1));
+}).catch(e => { console.error(e); process.exit(1); });
 const requestQueue = [];
 let isProcessing = false;
 function processQueue() {
@@ -194,42 +239,24 @@ function processQueue() {
         return;
     isProcessing = true;
     const req = requestQueue.shift();
-    console.log(`  [Queue]   Processing request #${req.reqId} (queue length: ${requestQueue.length})`);
     resolvers[req.file] = (result) => {
-        delete resolvers[req.file];
         isProcessing = false;
-        if (fs_1.default.existsSync(req.file)) {
+        if (fs_1.default.existsSync(req.file))
             fs_1.default.unlinkSync(req.file);
-        }
         req.resolve(result);
-        process.nextTick(processQueue);
+        processQueue();
     };
     inferProcess.stdin.write(req.file + '\n');
 }
-// ── Transcription endpoint ────────────────────────────────────────────
-app.post('/transcribe', express_1.default.raw({ type: 'audio/wav', limit: '50mb' }), async (req, res) => {
-    const reqId = req._reqId || 0;
-    if (!isReady) {
-        console.log(`  [#${reqId}]  Rejected: models still loading`);
-        return res.status(503).json({ error: 'Models are still loading' });
-    }
-    if (!req.body || !Buffer.isBuffer(req.body)) {
-        console.log(`  [#${reqId}]  Rejected: invalid audio body`);
-        return res.status(400).json({ error: 'Invalid audio body. Send raw WAV bytes with Content-Type: audio/wav' });
-    }
-    const audioSize = req.body.length;
-    console.log(`  [#${reqId}]  Audio received: ${formatBytes(audioSize)}`);
-    const tempFilePath = path_1.default.join(uploadDir, `upload-${Date.now()}-${Math.random().toString(36).substring(7)}.wav`);
-    fs_1.default.writeFileSync(tempFilePath, req.body);
-    console.log(`  [#${reqId}]  Queued for inference (queue length: ${requestQueue.length})`);
-    const result = await new Promise((resolve) => {
-        requestQueue.push({ file: tempFilePath, resolve, reqId });
+app.post('/transcribe', express_1.default.raw({ type: 'audio/wav', limit: '100mb' }), async (req, res) => {
+    if (!isReady)
+        return res.status(503).json({ error: 'Starting up' });
+    const tempFile = path_1.default.join(uploadDir, `up-${Date.now()}.wav`);
+    fs_1.default.writeFileSync(tempFile, req.body);
+    const result = await new Promise(r => {
+        requestQueue.push({ file: tempFile, resolve: r, reqId: req._reqId });
         processQueue();
     });
     res.json(result);
 });
-// ── Start server ──────────────────────────────────────────────────────
-app.listen(port, () => {
-    console.log(`\nHandy Remote Server is running on port ${port}`);
-    console.log(`Waiting for requests...\n`);
-});
+app.listen(port, () => console.log(`\nHandy Server on port ${port} | API Key: ${API_KEY}`));

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "handy-remote-server",
-  "version": "1.1.0",
+  "version": "1.3.0",
   "description": "Remote Transcription Server for Handy",
   "main": "dist/index.js",
   "bin": {
@@ -34,9 +34,13 @@
   },
   "devDependencies": {
     "@types/express": "^5.0.6",
+    "@types/gunzip-maybe": "^1.4.3",
     "@types/multer": "^2.1.0",
     "@types/node": "^25.3.5",
+    "@types/tar-fs": "^2.0.4",
+    "gunzip-maybe": "^1.4.2",
+    "tar-fs": "^3.1.2",
     "ts-node": "^10.9.2",
     "typescript": "^5.9.3"
   }
-}
+}

package/rust-infer/src/main.rs CHANGED Viewed

@@ -1,35 +1,91 @@
 use std::io::{self, BufRead, Write};
-use std::path::PathBuf;
-use transcribe_rs::{engines::gigaam::GigaAMEngine, TranscriptionEngine};
+use std::path::{Path, PathBuf};
+use transcribe_rs::TranscriptionEngine;
+use transcribe_rs::engines::{
+    gigaam::GigaAMEngine,
+    whisper::WhisperEngine,
+    moonshine::{MoonshineEngine, MoonshineModelParams, ModelVariant},
+    parakeet::{ParakeetEngine, ParakeetModelParams},
+    sense_voice::{SenseVoiceEngine, SenseVoiceModelParams},
+};
+enum EngineWrapper {
+    GigaAM(GigaAMEngine),
+    Whisper(WhisperEngine),
+    Moonshine(MoonshineEngine),
+    Parakeet(ParakeetEngine),
+    SenseVoice(SenseVoiceEngine),
+}
+impl EngineWrapper {
+    fn transcribe_samples(&mut self, audio: Vec<f32>) -> Result<transcribe_rs::TranscriptionResult, Box<dyn std::error::Error>> {
+        match self {
+            EngineWrapper::GigaAM(e) => e.transcribe_samples(audio, None).map_err(|e| e.into()),
+            EngineWrapper::Whisper(e) => e.transcribe_samples(audio, None).map_err(|e| e.into()),
+            EngineWrapper::Moonshine(e) => e.transcribe_samples(audio, None).map_err(|e| e.into()),
+            EngineWrapper::Parakeet(e) => e.transcribe_samples(audio, None).map_err(|e| e.into()),
+            EngineWrapper::SenseVoice(e) => e.transcribe_samples(audio, None).map_err(|e| e.into()),
+        }
+    }
+}
 fn main() -> Result<(), Box<dyn std::error::Error>> {
-    // We get model and config path from args or default
     let args: Vec<String> = std::env::args().collect();
-    let model_path = if args.len() > 1 {
-        args[1].clone()
-    } else {
-        let mut d = std::env::current_dir()?;
-        d.push("models");
-        d.push("gigaam.onnx");
-        d.to_string_lossy().to_string()
-    };
-    // Auto-download logic or print error if missing
-    if !PathBuf::from(&model_path).exists() {
-        eprintln!("Model file not found: {}. Please ensure model file exists.", model_path);
+    // Usage: rust-infer <engine_type> <model_path>
+    if args.len() < 3 {
+        eprintln!("Usage: rust-infer <engine_type> <model_path>");
+        eprintln!("Engines: gigaam, whisper, moonshine, parakeet, sensevoice");
         std::process::exit(1);
     }
-    eprintln!("Loading GigaAM model from {}...", model_path);
-    let mut engine = GigaAMEngine::new();
-    engine.load_model(std::path::Path::new(&model_path)).map_err(|e| format!("Failed to load GigaAM model: {}", e))?;
+    let engine_type = args[1].to_lowercase();
+    let model_path = &args[2];
+    if !PathBuf::from(model_path).exists() {
+        eprintln!("Model file not found: {}", model_path);
+        std::process::exit(1);
+    }
+    eprintln!("Loading {} engine with model {}...", engine_type, model_path);
+    let mut engine = match engine_type.as_str() {
+        "gigaam" => {
+            let mut e = GigaAMEngine::new();
+            e.load_model(Path::new(model_path))?;
+            EngineWrapper::GigaAM(e)
+        }
+        "whisper" => {
+            let mut e = WhisperEngine::new();
+            e.load_model(Path::new(model_path))?;
+            EngineWrapper::Whisper(e)
+        }
+        "moonshine" => {
+            let mut e = MoonshineEngine::new();
+            // Use Base as default for remote
+            e.load_model_with_params(Path::new(model_path), MoonshineModelParams::variant(ModelVariant::Base))?;
+            EngineWrapper::Moonshine(e)
+        }
+        "parakeet" => {
+            let mut e = ParakeetEngine::new();
+            e.load_model_with_params(Path::new(model_path), ParakeetModelParams::int8())?;
+            EngineWrapper::Parakeet(e)
+        }
+        "sensevoice" => {
+            let mut e = SenseVoiceEngine::new();
+            e.load_model_with_params(Path::new(model_path), SenseVoiceModelParams::int8())?;
+            EngineWrapper::SenseVoice(e)
+        }
+        _ => {
+            eprintln!("Unknown engine type: {}", engine_type);
+            std::process::exit(1);
+        }
+    };
     eprintln!("Model loaded. Ready to transcribe.");
-    println!("READY"); // Signal to Node.js that we are ready
+    println!("READY");
     io::stdout().flush()?;
     let stdin = io::stdin();
     for line in stdin.lock().lines() {
         let line = line?;
@@ -37,64 +93,47 @@ fn main() -> Result<(), Box<dyn std::error::Error>> {
         if line.is_empty() {
             continue;
         }
         if line == "EXIT" {
             break;
         }
-        // Line format: "file_path"
-        let wav_path = line;
-        // Read the file and convert to f32
-        match read_wav(wav_path) {
+        match read_wav(line) {
             Ok(samples) => {
-                match engine.transcribe_samples(samples, None) {
+                match engine.transcribe_samples(samples) {
                     Ok(result) => {
-                        let json = serde_json::json!({
-                            "status": "success",
-                            "text": result.text
-                        });
+                        let json = serde_json::json!({ "status": "success", "text": result.text });
                         println!("{}", json.to_string());
-                    },
+                    }
                     Err(e) => {
-                        let json = serde_json::json!({
-                            "status": "error",
-                            "error": format!("Transcription failed: {}", e)
-                        });
+                        let json = serde_json::json!({ "status": "error", "error": format!("Transcription failed: {}", e) });
                         println!("{}", json.to_string());
                     }
                 }
-            },
+            }
             Err(e) => {
-                let json = serde_json::json!({
-                    "status": "error",
-                    "error": format!("Failed to read WAV: {}", e)
-                });
+                let json = serde_json::json!({ "status": "error", "error": format!("Failed to read WAV: {}", e) });
                 println!("{}", json.to_string());
             }
         }
         io::stdout().flush()?;
     }
     Ok(())
 }
 fn read_wav(path: &str) -> Result<Vec<f32>, Box<dyn std::error::Error>> {
     let mut reader = hound::WavReader::open(path)?;
     let spec = reader.spec();
     let mut samples = Vec::new();
     match spec.sample_format {
         hound::SampleFormat::Int => {
             if spec.bits_per_sample == 16 {
                 for sample in reader.samples::<i16>() {
-                    let s = sample? as f32 / i16::MAX as f32;
-                    samples.push(s);
+                    samples.push(sample? as f32 / i16::MAX as f32);
                 }
             } else {
                 return Err("Only 16-bit integer WAV is supported".into());
             }
-        },
+        }
         hound::SampleFormat::Float => {
             if spec.bits_per_sample == 32 {
                 for sample in reader.samples::<f32>() {
@@ -105,17 +144,13 @@ fn read_wav(path: &str) -> Result<Vec<f32>, Box<dyn std::error::Error>> {
             }
         }
     }
-    // Multi-channel to mono (simple average)
     if spec.channels > 1 {
         let channels = spec.channels as usize;
         let mut mono = Vec::with_capacity(samples.len() / channels);
         for chunk in samples.chunks(channels) {
-            let sum: f32 = chunk.iter().sum();
-            mono.push(sum / channels as f32);
+            mono.push(chunk.iter().sum::<f32>() / channels as f32);
         }
         samples = mono;
     }
     Ok(samples)
 }