npm - @openpalm/channel-voice - Versions diffs - 0.9.0 → 0.9.1 - Mend

@openpalm/channel-voice 0.9.0 → 0.9.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (6) hide show

package/README.md ADDED Viewed

@@ -0,0 +1,147 @@
+# @openpalm/channel-voice
+Voice-driven conversational channel for [OpenPalm](https://github.com/itlackey/openpalm). Provides a web-based recording interface with a server-side pipeline that chains STT, LLM, and TTS using OpenAI-compatible APIs.
+## How it works
+```
+mic → STT → LLM → TTS → speaker
+```
+1. User speaks into the microphone (browser captures audio)
+2. Audio is transcribed to text (server STT or browser Speech Recognition)
+3. Text is forwarded to the assistant via the guardian (or direct LLM fallback)
+4. Response is synthesized to audio (server TTS or browser speechSynthesis)
+5. Audio plays back to the user
+Every step has a browser fallback — the channel works with zero API keys using only the Web Speech API.
+## Quick start
+```bash
+# Install dependencies
+bun install
+# Run locally (defaults to Ollama at localhost:11434)
+bun run dev
+```
+Open `http://localhost:8090` in your browser. Tap the microphone or press Space to start talking.
+## Configuration
+Copy `.env.example` to `.env` and adjust as needed. All settings use OpenAI-compatible API formats.
+### LLM (direct fallback)
+When the guardian is unavailable (e.g. running outside Docker), the channel calls the LLM directly.
+| Variable | Default | Description |
+|----------|---------|-------------|
+| `LLM_BASE_URL` | `http://localhost:11434` | LLM API base URL (Ollama default) |
+| `LLM_API_KEY` | `ollama` | API key |
+| `LLM_MODEL` | `qwen2.5:3b` | Model name |
+| `LLM_SYSTEM_PROMPT` | *(conversational)* | System prompt for voice responses |
+| `LLM_TIMEOUT_MS` | `60000` | Request timeout |
+### STT (Speech-to-Text)
+Server-side transcription. If not configured, the browser's `SpeechRecognition` API is used.
+| Variable | Default | Description |
+|----------|---------|-------------|
+| `STT_BASE_URL` | *(empty)* | STT API base URL |
+| `STT_API_KEY` | *(empty)* | API key |
+| `STT_MODEL` | `whisper-1` | Model name |
+| `STT_TIMEOUT_MS` | `30000` | Request timeout |
+### TTS (Text-to-Speech)
+Server-side speech synthesis. If not configured, the browser's `speechSynthesis` API is used.
+| Variable | Default | Description |
+|----------|---------|-------------|
+| `TTS_BASE_URL` | *(empty)* | TTS API base URL |
+| `TTS_API_KEY` | *(empty)* | API key |
+| `TTS_MODEL` | `tts-1` | Model name |
+| `TTS_VOICE` | `alloy` | Voice name |
+| `TTS_TIMEOUT_MS` | `30000` | Request timeout |
+### Server
+| Variable | Default | Description |
+|----------|---------|-------------|
+| `PORT` | `8186` | HTTP server port |
+| `GUARDIAN_URL` | `http://guardian:8080` | Guardian service URL (Docker) |
+| `CHANNEL_VOICE_SECRET` | *(required)* | HMAC secret for guardian signing |
+| `OPENAI_API_KEY` | *(empty)* | Shared fallback key for STT/TTS/LLM |
+## Docker Compose
+The voice channel runs in the unified `openpalm/channel` image. Add the registry overlay to your stack:
+```bash
+# Copy the overlay
+cp registry/channels/voice.yml ~/.config/openpalm/channels/
+# Restart the stack
+docker compose -f docker-compose.yml -f channels/voice.yml up -d
+```
+The web UI is served at the channel's port (default 8186).
+## API
+### `GET /api/health`
+Returns service status and provider configuration.
+```json
+{
+  "ok": true,
+  "service": "channel-voice",
+  "stt": { "model": "whisper-1", "configured": false },
+  "tts": { "model": "tts-1", "voice": "alloy", "configured": false },
+  "llm": { "model": "qwen2.5:3b", "configured": true }
+}
+```
+### `POST /api/pipeline`
+Full voice pipeline. Accepts `multipart/form-data` with either:
+- `audio` — audio file (server STT transcribes it)
+- `text` — pre-transcribed text (browser STT path)
+Response:
+```json
+{
+  "transcript": "What is the capital of France?",
+  "response": "The capital of France is Paris.",
+  "audio": "<base64 mp3 or null>"
+}
+```
+## Features
+- **Browser fallback** — Works without any API keys using Web Speech APIs
+- **Continuous listening** — Toggle auto-restart to keep the mic open between responses
+- **Markdown rendering** — AI responses render bold, italic, code blocks in the UI
+- **Markdown stripping** — TTS reads clean prose, not syntax characters
+- **LLM fallback** — Direct LLM call when the guardian/assistant is unreachable
+- **PWA** — Installable with offline shell caching
+- **Accessible** — Keyboard nav (Space to toggle), screen reader announcements, focus outlines
+## Development
+```bash
+bun run dev          # Start with hot reload (port 8090)
+bun run test         # Unit tests (bun:test)
+bun run test:e2e     # Playwright e2e tests (22 tests)
+bun run typecheck    # TypeScript check
+```
+## License
+[MPL-2.0](https://www.mozilla.org/en-US/MPL/2.0/)

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "@openpalm/channel-voice",
-  "version": "0.9.0",
+  "version": "0.9.1",
   "type": "module",
   "license": "MPL-2.0",
   "repository": {
@@ -8,6 +8,7 @@
     "url": "https://github.com/itlackey/openpalm",
     "directory": "packages/channel-voice"
   },
+  "access": "public",
   "main": "src/index.ts",
   "files": [
     "src",
@@ -15,8 +16,7 @@
   ],
   "scripts": {
     "start": "bun run src/index.ts",
-    "dev": "export CHANNEL_VOICE_SECRET=105a158d326fa54e569b234d4458ada2 && export PORT=8090 && bun --watch run src/index.ts",
-    "dev:unset": "unset  STT_API_KEY && unset OPENAI_API_KEY && export CHANNEL_VOICE_SECRET=105a158d326fa54e569b234d4458ada2 && export PORT=8090 && bun --watch run src/index.ts",
+    "dev": "CHANNEL_VOICE_SECRET=test-secret bun --watch run src/index.ts",
     "typecheck": "tsc --noEmit",
     "test": "bun test src/",
     "test:e2e": "npx playwright test --config=playwright.config.ts"
@@ -28,4 +28,4 @@
     "@openpalm/channels-sdk": ">=0.8.0 <1.0.0",
     "@playwright/test": "^1.58.2"
   }
-}
+}

package/src/config.ts CHANGED Viewed

@@ -7,52 +7,69 @@ import { resolve } from 'node:path'
 interface Config {
   server: { webRoot: string }
-  stt: { baseUrl: string; apiKey: string; model: string; timeoutMs: number }
-  tts: { baseUrl: string; apiKey: string; model: string; voice: string; timeoutMs: number }
+  stt: { baseUrl: string; apiKey: string; model: string; timeoutMs: number; configured: boolean }
+  tts: { baseUrl: string; apiKey: string; model: string; voice: string; timeoutMs: number; configured: boolean }
   llm: { baseUrl: string; apiKey: string; model: string; timeoutMs: number; systemPrompt: string }
 }
+// env uses ?? so an explicit empty value (KEY=) clears the default.
+// envOrDefault uses || so empty strings still get the fallback (for models, voices, etc).
 function env(key: string, fallback = ''): string {
+  return Bun.env[key] ?? fallback
+}
+function envOrDefault(key: string, fallback: string): string {
   return Bun.env[key] || fallback
 }
 function envInt(key: string, fallback: number): number {
   const v = Bun.env[key]
-  if (!v) return fallback
+  if (v === undefined || v === '') return fallback
   const n = parseInt(v, 10)
   return Number.isNaN(n) ? fallback : n
 }
 // Resolve API key: check dedicated key first, then shared OPENAI_API_KEY.
-// Only use OPENAI_API_KEY if the dedicated key is truly unset (not present in env at all),
-// to avoid shell-inherited vars overriding .env values unexpectedly.
+// Falls back to OPENAI_API_KEY only when the dedicated key is absent or
+// explicitly empty — an empty dedicated key means "no key" (keyless provider).
 function resolveApiKey(dedicatedKey: string): string {
   const dedicated = Bun.env[dedicatedKey]
   if (dedicated !== undefined && dedicated !== '') return dedicated
-  return Bun.env.OPENAI_API_KEY || ''
+  return Bun.env.OPENAI_API_KEY ?? ''
 }
+// STT/TTS are considered "configured" when a base URL is set (even without
+// a key — local providers like whisper-local, kokoro, piper are keyless).
+function isProviderConfigured(baseUrl: string): boolean {
+  return baseUrl !== ''
+}
+const sttBaseUrl = env('STT_BASE_URL').replace(/\/$/, '')
+const ttsBaseUrl = env('TTS_BASE_URL').replace(/\/$/, '')
 export const config: Config = {
   server: {
     webRoot: resolve(env('WEB_ROOT', new URL('../web', import.meta.url).pathname)),
   },
   stt: {
-    baseUrl: env('STT_BASE_URL', 'https://api.openai.com').replace(/\/$/, ''),
+    baseUrl: sttBaseUrl,
     apiKey: resolveApiKey('STT_API_KEY'),
-    model: env('STT_MODEL', 'whisper-1'),
+    model: envOrDefault('STT_MODEL', 'whisper-1'),
     timeoutMs: envInt('STT_TIMEOUT_MS', 30_000),
+    configured: isProviderConfigured(sttBaseUrl),
   },
   tts: {
-    baseUrl: env('TTS_BASE_URL', 'https://api.openai.com').replace(/\/$/, ''),
+    baseUrl: ttsBaseUrl,
     apiKey: resolveApiKey('TTS_API_KEY'),
-    model: env('TTS_MODEL', 'tts-1'),
-    voice: env('TTS_VOICE', 'alloy'),
+    model: envOrDefault('TTS_MODEL', 'tts-1'),
+    voice: envOrDefault('TTS_VOICE', 'alloy'),
     timeoutMs: envInt('TTS_TIMEOUT_MS', 30_000),
+    configured: isProviderConfigured(ttsBaseUrl),
   },
   llm: {
-    baseUrl: env('LLM_BASE_URL', 'http://localhost:11434').replace(/\/$/, ''),
-    apiKey: env('LLM_API_KEY', 'ollama'),
-    model: env('LLM_MODEL', 'qwen2.5:3b'),
+    baseUrl: envOrDefault('LLM_BASE_URL', 'http://localhost:11434').replace(/\/$/, ''),
+    apiKey: envOrDefault('LLM_API_KEY', 'ollama'),
+    model: envOrDefault('LLM_MODEL', 'qwen2.5:3b'),
     timeoutMs: envInt('LLM_TIMEOUT_MS', 60_000),
     systemPrompt: env('LLM_SYSTEM_PROMPT', 'You are a helpful voice assistant. Respond conversationally and concisely. Do not use markdown formatting.'),
   },

package/src/index.ts CHANGED Viewed

@@ -11,7 +11,7 @@
  *   GET  /*             — Static file serving from web/ directory
  */
-import { extname, join, resolve } from 'node:path'
+import { extname, join, resolve, sep } from 'node:path'
 import { BaseChannel, type HandleResult, createLogger } from '@openpalm/channels-sdk'
 import type { GuardianSuccessResponse } from '@openpalm/channels-sdk'
 import { config } from './config'
@@ -46,8 +46,8 @@ export default class VoiceChannel extends BaseChannel {
       return this.json(200, {
         ok: true,
         service: 'channel-voice',
-        stt: { model: config.stt.model, configured: !!config.stt.apiKey },
-        tts: { model: config.tts.model, voice: config.tts.voice, configured: !!config.tts.apiKey },
+        stt: { model: config.stt.model, configured: config.stt.configured },
+        tts: { model: config.tts.model, voice: config.tts.voice, configured: config.tts.configured },
         llm: { model: config.llm.model, configured: !!config.llm.apiKey },
       })
     }
@@ -82,8 +82,11 @@ export default class VoiceChannel extends BaseChannel {
       return this.json(413, { error: 'Audio too large (max 25MB)' })
     }
-    const userId = req.headers.get('x-forwarded-for')
-      || req.headers.get('x-real-ip')
+    // Use client-provided ID (from x-client-id header or form field),
+    // falling back to x-forwarded-for (first IP only) or a default.
+    const clientId = (form.get('clientId') as string | null)
+      || req.headers.get('x-client-id')
+      || (req.headers.get('x-forwarded-for') || '').split(',')[0].trim()
       || 'voice-user'
     // Step 1: STT — transcribe audio, or use provided text (browser STT fallback)
@@ -91,7 +94,7 @@ export default class VoiceChannel extends BaseChannel {
     if (typeof textField === 'string' && textField.trim()) {
       transcript = textField.trim()
     } else if (audioFile instanceof File) {
-      if (!config.stt.apiKey) {
+      if (!config.stt.configured) {
         return this.json(400, { error: 'STT not configured', code: 'stt_not_configured' })
       }
       try {
@@ -111,7 +114,7 @@ export default class VoiceChannel extends BaseChannel {
     // Step 2: Forward transcript to guardian, fall back to direct LLM
     let answer: string
     try {
-      const guardianResp = await this.forward({ userId, text: transcript })
+      const guardianResp = await this.forward({ userId: clientId, text: transcript })
       if (!guardianResp.ok) {
         this.log('error', 'Guardian error', { status: guardianResp.status })
@@ -145,8 +148,8 @@ export default class VoiceChannel extends BaseChannel {
     const pathname = url.pathname === '/' ? '/index.html' : url.pathname
     const filePath = resolve(join(config.server.webRoot, pathname.replace(/^\/+/, '')))
-    // Prevent path traversal
-    if (!filePath.startsWith(config.server.webRoot)) {
+    // Prevent path traversal — ensure resolved path is strictly inside webRoot
+    if (!filePath.startsWith(config.server.webRoot + sep) && filePath !== config.server.webRoot) {
       return new Response('Forbidden', { status: 403 })
     }
@@ -185,8 +188,8 @@ export default class VoiceChannel extends BaseChannel {
 if (import.meta.main) {
   const log = createLogger('channel-voice')
   log.info('config', {
-    stt: config.stt.apiKey ? `${config.stt.baseUrl} (${config.stt.model})` : 'not configured — browser fallback',
-    tts: config.tts.apiKey ? `${config.tts.baseUrl} (${config.tts.model}, ${config.tts.voice})` : 'not configured — browser fallback',
+    stt: config.stt.configured ? `${config.stt.baseUrl} (${config.stt.model})` : 'not configured — browser fallback',
+    tts: config.tts.configured ? `${config.tts.baseUrl} (${config.tts.model}, ${config.tts.voice})` : 'not configured — browser fallback',
   })
   const channel = new VoiceChannel()
   channel.start()

package/src/providers.ts CHANGED Viewed

@@ -1,5 +1,7 @@
 /**
  * STT and TTS API calls. Both use OpenAI-compatible APIs.
+ * Auth headers are only sent when an API key is configured,
+ * allowing keyless local providers (whisper-local, kokoro, piper).
  */
 import { createLogger } from '@openpalm/channels-sdk'
@@ -7,7 +9,7 @@ import { config } from './config'
 const log = createLogger('channel-voice')
-// ── Timeout helper ──────────────────────────────────────────────────────
+// ── Helpers ────────────────────────────────────────────────────────────
 async function fetchWithTimeout(url: string, init: RequestInit, timeoutMs: number): Promise<Response> {
   const controller = new AbortController()
@@ -24,11 +26,31 @@ async function fetchWithTimeout(url: string, init: RequestInit, timeoutMs: numbe
   }
 }
+/** Build auth headers only when a key is present (keyless providers get none). */
+function authHeaders(apiKey: string): Record<string, string> {
+  return apiKey ? { Authorization: `Bearer ${apiKey}` } : {}
+}
+/** Strip markdown syntax so TTS reads clean prose. */
+function stripMarkdown(text: string): string {
+  return text
+    .replace(/```[\s\S]*?```/g, '')
+    .replace(/`([^`]+)`/g, '$1')
+    .replace(/\*\*([^*]+)\*\*/g, '$1')
+    .replace(/\*([^*]+)\*/g, '$1')
+    .replace(/^#{1,6}\s+/gm, '')
+    .replace(/^\s*[-*+]\s+/gm, '')
+    .replace(/^\s*\d+\.\s+/gm, '')
+    .replace(/\[([^\]]+)\]\([^)]+\)/g, '$1')
+    .replace(/\n{3,}/g, '\n\n')
+    .trim()
+}
 // ── STT ─────────────────────────────────────────────────────────────────
 /**
  * Transcribe audio via OpenAI-compatible STT API.
- * Accepts the raw File from the client's FormData.
+ * Auth header is omitted for keyless providers (e.g. local whisper).
  */
 export async function transcribe(audioFile: File): Promise<string> {
   const form = new FormData()
@@ -39,7 +61,7 @@ export async function transcribe(audioFile: File): Promise<string> {
     `${config.stt.baseUrl}/v1/audio/transcriptions`,
     {
       method: 'POST',
-      headers: { Authorization: `Bearer ${config.stt.apiKey}` },
+      headers: authHeaders(config.stt.apiKey),
       body: form,
     },
     config.stt.timeoutMs,
@@ -58,26 +80,11 @@ export async function transcribe(audioFile: File): Promise<string> {
 /**
  * Synthesize text to audio via OpenAI-compatible TTS API.
- * Returns base64-encoded mp3 string, or null if TTS is not configured or fails.
- * TTS failure is non-fatal — the client still gets the text response.
+ * Returns base64-encoded mp3, or null if TTS is not configured or fails.
+ * Auth header is omitted for keyless providers (e.g. kokoro, piper).
  */
-/** Strip markdown syntax so TTS reads clean prose. */
-function stripMarkdown(text: string): string {
-  return text
-    .replace(/```[\s\S]*?```/g, '')       // remove code blocks
-    .replace(/`([^`]+)`/g, '$1')          // inline code → plain text
-    .replace(/\*\*([^*]+)\*\*/g, '$1')    // bold → plain
-    .replace(/\*([^*]+)\*/g, '$1')        // italic → plain
-    .replace(/^#{1,6}\s+/gm, '')          // headings → plain
-    .replace(/^\s*[-*+]\s+/gm, '')        // list markers → plain
-    .replace(/^\s*\d+\.\s+/gm, '')        // numbered lists → plain
-    .replace(/\[([^\]]+)\]\([^)]+\)/g, '$1') // links → text only
-    .replace(/\n{3,}/g, '\n\n')           // collapse excess newlines
-    .trim()
-}
 export async function synthesize(text: string): Promise<string | null> {
-  if (!text.trim() || !config.tts.apiKey) return null
+  if (!text.trim() || !config.tts.configured) return null
   const cleanText = stripMarkdown(text)
   if (!cleanText) return null
@@ -89,7 +96,7 @@ export async function synthesize(text: string): Promise<string | null> {
       {
         method: 'POST',
         headers: {
-          Authorization: `Bearer ${config.tts.apiKey}`,
+          ...authHeaders(config.tts.apiKey),
           'Content-Type': 'application/json',
         },
         body: JSON.stringify({
@@ -130,7 +137,7 @@ export async function chatCompletion(prompt: string): Promise<string> {
     {
       method: 'POST',
       headers: {
-        Authorization: `Bearer ${config.llm.apiKey}`,
+        ...authHeaders(config.llm.apiKey),
         'Content-Type': 'application/json',
       },
       body: JSON.stringify({

package/web/sw.js CHANGED Viewed

@@ -1,4 +1,4 @@
-const CACHE = 'voice-v2'
+const CACHE = 'voice-v3'
 const SHELL = ['/', '/index.html', '/styles.css', '/app.js', '/manifest.webmanifest']
 self.addEventListener('install', (e) => {
@@ -14,13 +14,18 @@ self.addEventListener('activate', (e) => {
 self.addEventListener('fetch', (e) => {
   const url = new URL(e.request.url)
-  if (url.pathname.startsWith('/api/')) return
-  // Network-first for all assets (cache is offline fallback only)
+  // Only cache same-origin GET requests; skip API calls and non-GET methods
+  if (e.request.method !== 'GET' || url.pathname.startsWith('/api/')) return
+  // Network-first: update cache on success, serve from cache when offline
   e.respondWith(
     fetch(e.request).then((res) => {
       const clone = res.clone()
       caches.open(CACHE).then((c) => c.put(e.request, clone))
       return res
-    }).catch(() => caches.match(e.request))
+    }).catch(() =>
+      caches.match(e.request).then((cached) =>
+        cached || new Response('Offline', { status: 503, headers: { 'Content-Type': 'text/plain' } })
+      )
+    )
   )
 })