npm - @cheeko-ai/esp32-voice - Versions diffs - 2026.2.2-3.1 - Mend

@cheeko-ai/esp32-voice 2026.2.2-3.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (23) hide show

package/NPM_PUBLISH_READINESS.md +299 -0
package/README.md +290 -0
package/TODO.md +418 -0
package/index.ts +128 -0
package/openclaw.plugin.json +9 -0
package/package.json +62 -0
package/src/accounts.ts +110 -0
package/src/channel.ts +270 -0
package/src/config-schema.ts +37 -0
package/src/device/device-otp.ts +173 -0
package/src/http-handler.ts +154 -0
package/src/monitor.ts +124 -0
package/src/onboarding.ts +575 -0
package/src/runtime.ts +14 -0
package/src/stt/deepgram.ts +215 -0
package/src/stt/stt-provider.ts +107 -0
package/src/stt/stt-registry.ts +71 -0
package/src/tts/elevenlabs.ts +215 -0
package/src/tts/tts-provider.ts +111 -0
package/src/tts/tts-registry.ts +71 -0
package/src/types.ts +136 -0
package/src/voice/voice-endpoint.ts +296 -0
package/src/voice/voice-session.ts +1041 -0

package/NPM_PUBLISH_READINESS.md ADDED Viewed

@@ -0,0 +1,299 @@
+# ESP32-Voice Plugin — npm Publishing Readiness Report
+> Full analysis of what needs to change before this plugin can be published to npm
+> and work out-of-the-box for other OpenClaw users.
+---
+## Current Status: NOT READY for npm publish
+---
+## 🔴 Section 1 — Security (Fix Before Anything Else)
+### 1.1 Real API keys exposed in `SETUP.md`
+`SETUP.md` contains live, active credentials committed to the repository:
+```
+GEMINI_API_KEY=YOUR_GEMINI_API_KEY_HERE
+ELEVENLABS_API_KEY=YOUR_ELEVENLABS_API_KEY_HERE
+```
+**Action required:**
+1. Rotate both keys immediately (they are now public):
+   - ElevenLabs: https://elevenlabs.io/app/settings/api-keys → delete → create new
+   - Google AI Studio: https://aistudio.google.com/apikey → delete → create new
+2. Replace all real values in `SETUP.md` with placeholders like `<YOUR_ELEVENLABS_API_KEY>`
+---
+### 1.2 Hardcoded fallback gateway token in `ota-server.js`
+```js
+// Line ~59 in ota-server.js
+const GATEWAY_TOKEN = process.env.GATEWAY_TOKEN || "YOUR_GATEWAY_TOKEN_HERE";
+```
+This token is now public. Any user who forgets to set the env var silently uses this known token.
+**Action required:**
+Replace with a hard exit if the env var is not set:
+```js
+const GATEWAY_TOKEN = process.env.GATEWAY_TOKEN;
+if (!GATEWAY_TOKEN) {
+  console.error("[ota-server] ERROR: GATEWAY_TOKEN env var is required.");
+  process.exit(1);
+}
+```
+---
+## 🔴 Section 2 — package.json Fixes (Required for npm publish)
+### 2.1 Remove unused `@discordjs/opus` dependency
+`@discordjs/opus` is listed in `dependencies` but is **never imported anywhere** in the source.
+It was replaced by `opusscript` after macOS Gatekeeper rejected its prebuilt native binary.
+Leaving it in `dependencies` means every user downloads and tries to install a native binary
+they don't need — and it will fail on many systems.
+```bash
+cd extensions/esp32-voice
+npm uninstall @discordjs/opus
+```
+**Why opusscript needs no install script:**
+`opusscript` is pure JavaScript/WebAssembly. It requires:
+- No native compilation
+- No `node-gyp`
+- No system libraries (no `libopus` to install)
+- No pre/post-install scripts
+It installs cleanly on macOS, Linux, and Windows via a plain `npm install`. No extra steps for users.
+---
+### 2.2 Update version to CalVer
+All OpenClaw extensions use CalVer format (`YYYY.M.D`). This package has `"version": "1.0.0"`.
+Change to: `"version": "2026.2.21"` (or the date of first publish)
+---
+### 2.3 Add `ota-server.js` to the `files` array
+`ota-server.js` is not in the `files` array so it will be excluded from the npm package.
+Users will install the plugin but have no OTA server.
+Current `files` array:
+```json
+"files": ["index.ts", "src/", "openclaw.plugin.json", "README.md"]
+```
+Should be:
+```json
+"files": ["index.ts", "src/", "openclaw.plugin.json", "README.md", "ota-server.js", ".env.example"]
+```
+---
+### 2.4 Add `.env.example` file (new file to create)
+Users need to know what env vars to set. Create `extensions/esp32-voice/.env.example`:
+```bash
+# Required — get free key at https://console.deepgram.com
+DEEPGRAM_API_KEY=<your-deepgram-api-key>
+# Required — get free key at https://elevenlabs.io
+ELEVENLABS_API_KEY=<your-elevenlabs-api-key>
+# Optional — find voice IDs at https://elevenlabs.io/voice-library
+# Default: Rachel (21m00Tcm4TlvDq8ikWAM)
+ELEVENLABS_VOICE_ID=21m00Tcm4TlvDq8ikWAM
+# Optional — default: eleven_turbo_v2_5
+ELEVENLABS_MODEL_ID=eleven_turbo_v2_5
+# Optional — default: nova-2
+DEEPGRAM_MODEL=nova-2
+# Optional — port for ESP32 WebSocket server (default: 8765)
+ESP32_VOICE_PORT=8765
+# Required for OTA server — copy from your openclaw.json or gateway setup
+GATEWAY_TOKEN=<your-openclaw-gateway-token>
+# Optional — default: ws://127.0.0.1:18789
+OPENCLAW_GATEWAY_URL=ws://127.0.0.1:18789
+```
+---
+## 🟡 Section 3 — Setup Experience for New Users
+### What a new user has to do today (too many steps)
+1. Install OpenClaw and the plugin
+2. Sign up for Deepgram (STT) — get API key
+3. Sign up for ElevenLabs (TTS) — get API key
+4. Add keys to `~/.openclaw/.env`
+5. Edit `~/.openclaw/openclaw.json` with a device token (generate manually)
+6. Start OpenClaw Gateway in one terminal
+7. Start `ota-server.js` in a **second terminal**
+8. Find their machine's LAN IP address
+9. Flash ESP32 with the OTA URL
+10. Reboot ESP32 and hope auto-detect worked
+**Pain point:** Two separate processes, manual IP lookup, unclear token generation.
+---
+### 3.1 Fix hardcoded timezone in `ota-server.js`
+The OTA server hardcodes IST (UTC+5:30 = 330 minutes). Users in other timezones get wrong device time on their ESP32.
+Current:
+```js
+timezone_offset: 330  // hardcoded IST
+```
+Fix:
+```js
+timezone_offset: -new Date().getTimezoneOffset()  // reads system timezone
+```
+> `getTimezoneOffset()` returns minutes west of UTC (negative for east of UTC),
+> so negating it gives the standard "minutes east of UTC" that XiaoZhi expects.
+---
+### 3.2 Integrate OTA endpoint into the plugin HTTP handler (removes second terminal)
+Currently `ota-server.js` must be run as a separate process. The plugin already has an
+HTTP handler (`src/http-handler.ts`). Adding the OTA route there means users only run
+one command — `openclaw gateway` — and everything works.
+The OTA route should be served at:
+```
+http://<your-lan-ip>:18789/__openclaw__/esp32-voice/ota/
+```
+Payload returned (same as ota-server.js currently returns):
+```json
+{
+  "websocket": {
+    "url": "ws://<LAN_IP>:8765/"
+  },
+  "openclaw": {
+    "url": "ws://127.0.0.1:18789",
+    "token": "<GATEWAY_TOKEN>"
+  },
+  "timezone_offset": -330
+}
+```
+Once integrated, update `README.md` to say "OTA is automatically available — no second server needed."
+---
+### 3.3 Rewrite `README.md` top section as a 3-step Quick Start
+The current README has good detail but buries the getting-started path. New users need:
+```
+## Quick Start
+### Step 1 — Get API keys (both have free tiers)
+- Deepgram: https://console.deepgram.com → API Keys → Create
+- ElevenLabs: https://elevenlabs.io → Profile → API Keys
+### Step 2 — Configure
+cp .env.example ~/.openclaw/.env
+# Edit ~/.openclaw/.env with your keys
+### Step 3 — Flash your ESP32
+Point your XiaoZhi firmware OTA URL to:
+  http://<your-machine-lan-ip>:18789/__openclaw__/esp32-voice/ota/
+Reboot the device. Done.
+```
+---
+## 🟡 Section 4 — Reliability Improvements
+### 4.1 Persist OTP pairing across Gateway restarts
+Currently paired devices are stored in memory only. If the Gateway restarts, all ESP32
+devices need to be re-paired with a new OTP. This is very annoying for always-on devices.
+Fix: persist the paired device map to `~/.openclaw/esp32-voice-devices.json` on each
+successful pairing and reload it on startup.
+---
+### 4.2 Add rate limiting to OTP verification
+The OTP is 6 digits (100,000 possible values). Without rate limiting, someone on the
+same LAN could brute-force it in minutes. Add a per-IP attempt counter: lock out for
+15 minutes after 5 failed attempts.
+---
+### 4.3 Auto-reconnect to Gateway on disconnect
+If the OpenClaw Gateway drops the connection (restart, timeout, network blip),
+`openclawConnected` goes false and stays false until the ESP32 session is restarted.
+Should reconnect automatically with exponential backoff (3s → 6s → 12s → 30s cap).
+---
+## 🟢 Section 5 — Pre-publish Checklist
+Run through this before `npm publish`:
+- [ ] Rotated the exposed API keys (ElevenLabs + Gemini)
+- [ ] `SETUP.md` has no real credentials — only `<PLACEHOLDER>` values
+- [ ] Hardcoded fallback token removed from `ota-server.js`
+- [ ] `@discordjs/opus` removed from `package.json` dependencies
+- [ ] Version updated to CalVer (e.g. `2026.2.21`)
+- [ ] `ota-server.js` added to `files` array in `package.json`
+- [ ] `.env.example` created and added to `files` array
+- [ ] Timezone fix applied in `ota-server.js`
+- [ ] README has a 3-step Quick Start at the very top
+- [ ] Ran `npm pack --dry-run` and verified:
+  - No `node_modules/` in the package
+  - No `.env` file in the package
+  - No real credentials in any file
+- [ ] Test clean install: `mkdir /tmp/test && cd /tmp/test && npm install @openclaw/esp32-voice`
+Then publish:
+```bash
+cd extensions/esp32-voice
+npm login
+npm publish --access public
+```
+---
+## Summary Table
+| # | Issue | Severity | Effort |
+|---|---|---|---|
+| 1.1 | Real API keys in SETUP.md | 🔴 Critical | 5 min |
+| 1.2 | Hardcoded fallback token in ota-server.js | 🔴 Critical | 5 min |
+| 2.1 | Remove unused @discordjs/opus | 🔴 Blocker | 2 min |
+| 2.2 | Version to CalVer | 🔴 Blocker | 1 min |
+| 2.3 | Add ota-server.js to files array | 🔴 Blocker | 2 min |
+| 2.4 | Create .env.example | 🟡 Important | 10 min |
+| 3.1 | Fix hardcoded IST timezone | 🟡 Important | 5 min |
+| 3.2 | Integrate OTA into plugin HTTP handler | 🟡 Important | 2–3 hours |
+| 3.3 | Rewrite README Quick Start | 🟡 Important | 30 min |
+| 4.1 | Persist OTP pairing to disk | 🟡 Nice to have | 1 hour |
+| 4.2 | Rate limit OTP attempts | 🟡 Nice to have | 1 hour |
+| 4.3 | Auto-reconnect to Gateway | 🟡 Nice to have | 1 hour |
+**Minimum to publish safely: items 1.1, 1.2, 2.1, 2.2, 2.3**
+**Minimum for a good user experience: all of Section 2 + Section 3**

package/README.md ADDED Viewed

@@ -0,0 +1,290 @@
+# 🎤 ESP32 Voice — OpenClaw Extension
+Turn a XiaoZhi ESP32 board into a voice AI assistant powered by OpenClaw.
+Push to talk → speak → get a spoken response. Integrates with the Cheeko dashboard for device management.
+---
+## Install
+```bash
+npm install @cheeko-ai/esp32-voice
+```
+Or via OpenClaw plugin system:
+```bash
+openclaw channels add
+# Select "ESP32 Voice (plugin)" from the menu
+```
+---
+## Quick Start
+### Step 1 — Run the setup wizard
+```bash
+pnpm openclaw channels add
+```
+Select **ESP32 Voice (plugin)** from the channel menu. The interactive wizard guides you through:
+| Step | What happens |
+|------|-------------|
+| **1. Connect to Cheeko** | Opens dashboard link → you log in → paste the pairing token |
+| **2. STT setup** | Enter your Deepgram API key |
+| **3. TTS setup** | Enter your ElevenLabs API key + voice ID |
+| **4. Add device** | Opens dashboard to add your ESP32 device |
+All keys are saved to `~/.openclaw/.env` automatically — you only do this once.
+> **Note:** Use Node.js 22. Run `nvm use 22` before any openclaw commands.
+---
+### Step 2 — Start the Gateway
+```bash
+openclaw gateway
+```
+On startup the plugin:
+- Starts the voice WebSocket server on port **8765**
+- Auto-registers your machine's WebSocket URL with the Cheeko dashboard (if `CHEEKO_PAIR` is set)
+---
+### Step 3 — Start the OTA server
+The OTA server tells your ESP32 where to connect on boot:
+```bash
+GATEWAY_TOKEN=<your-gateway-token> node $(openclaw plugins path @cheeko-ai/esp32-voice)/ota-server.js
+```
+It prints your URLs:
+```
+🦞 ESP32 OTA Mock Server
+   Auto-detected MAC IP : 192.168.1.10
+   OTA Server           : http://192.168.1.10:8080/xiaozhi/ota/
+   Voice WebSocket      : ws://192.168.1.10:8765/
+```
+> **Gateway token** — found in `~/.openclaw/openclaw.json` under `gateway.auth.token`.
+---
+### Step 4 — Flash your ESP32
+In your XiaoZhi firmware settings, set the OTA URL to what the server printed:
+```
+http://192.168.1.10:8080/xiaozhi/ota/
+```
+Reboot the device. It fetches its config, connects to the voice server, and is ready.
+**Hold the button → speak → release → hear the response.**
+---
+## How It Works
+```
+ESP32 (XiaoZhi firmware)
+  │  Opus audio frames  →  WebSocket port 8765
+  ▼
+[esp32-voice plugin]
+  │  STT: Deepgram       →  transcript text
+  │  LLM: OpenClaw Gateway (port 18789)  →  response text
+  │  TTS: ElevenLabs     →  Opus audio frames
+  ▼
+ESP32 speaker
+```
+The plugin runs its own WebSocket server on port **8765** — completely separate from the OpenClaw Gateway port (18789). No changes to OpenClaw core are needed.
+---
+## Cheeko Dashboard Pairing
+The plugin auto-registers your machine's voice URL with the Cheeko dashboard when `CHEEKO_PAIR` is set.
+**How the pairing works:**
+1. Log in to the Cheeko dashboard → **Settings → Connect OpenClaw**
+2. The dashboard generates a short pairing token (e.g. `XK9-2M4`)
+3. Paste it into the setup wizard (or set `CHEEKO_PAIR=XK9-2M4` in `~/.openclaw/.env`)
+4. On next gateway start, the plugin POSTs your voice URL to the dashboard automatically
+5. Your ESP32 devices in the dashboard now know where to connect
+```
+Dashboard generates token  →  you paste it in wizard
+    ↓
+Plugin saves CHEEKO_PAIR to ~/.openclaw/.env
+    ↓
+openclaw gateway starts
+    ↓
+Plugin POSTs ws://<your-ip>:8765/ to dashboard API
+    ↓
+Dashboard stores your OpenClaw URL against your account
+    ↓
+ESP32 devices fetch config → connect to your machine
+```
+---
+## Configuration Reference
+All config goes in `~/.openclaw/openclaw.json` under `channels.esp32voice`.
+Keys can also be set in `~/.openclaw/.env`.
+### Minimal config
+```json5
+{
+  channels: {
+    esp32voice: {
+      enabled: true,
+    },
+  },
+}
+```
+Keys are read from env vars automatically:
+```bash
+# ~/.openclaw/.env
+DEEPGRAM_API_KEY=your-deepgram-key
+ELEVENLABS_API_KEY=your-elevenlabs-key
+CHEEKO_PAIR=XK9-2M4
+```
+### Full config
+```json5
+{
+  channels: {
+    esp32voice: {
+      enabled: true,
+      sttApiKey: "your-deepgram-key",
+      ttsApiKey: "your-elevenlabs-key",
+      ttsVoiceId: "21m00Tcm4TlvDq8ikWAM",  // optional, defaults to Rachel
+      language: "en",
+      maxResponseLength: 500,
+      voiceOptimized: true,
+    },
+  },
+}
+```
+### All options
+| Key | Type | Default | Description |
+|-----|------|---------|-------------|
+| `enabled` | boolean | `true` | Enable/disable the channel |
+| `sttProvider` | string | `"deepgram"` | STT provider ID |
+| `sttApiKey` | string | — | Deepgram API key |
+| `sttModel` | string | `"nova-2"` | Deepgram model |
+| `ttsProvider` | string | `"elevenlabs"` | TTS provider ID |
+| `ttsApiKey` | string | — | ElevenLabs API key |
+| `ttsVoiceId` | string | Rachel | ElevenLabs voice ID |
+| `ttsModel` | string | `"eleven_turbo_v2_5"` | ElevenLabs model |
+| `language` | string | `"en"` | Language code (ISO 639-1) |
+| `maxResponseLength` | number | `500` | Max response chars (keep short for voice) |
+| `voiceOptimized` | boolean | `true` | Tells the AI to respond concisely without markdown |
+### Environment variables
+| Variable | Description |
+|----------|-------------|
+| `DEEPGRAM_API_KEY` | Deepgram STT API key |
+| `ELEVENLABS_API_KEY` | ElevenLabs TTS API key |
+| `ELEVENLABS_VOICE_ID` | ElevenLabs voice ID (optional) |
+| `ELEVENLABS_MODEL_ID` | ElevenLabs model (optional) |
+| `DEEPGRAM_MODEL` | Deepgram model (optional) |
+| `ESP32_VOICE_PORT` | Voice WebSocket server port (default: `8765`) |
+| `GATEWAY_TOKEN` | Required for OTA server |
+| `CHEEKO_PAIR` | Pairing token from Cheeko dashboard — enables auto-registration |
+| `CHEEKO_DASHBOARD_URL` | Cheeko dashboard UI URL (default: `http://64.227.170.31:8001`) |
+| `CHEEKO_API_URL` | Cheeko backend API URL (default: `http://64.227.170.31:8002/toy`) |
+| `MAC_IP` | Override auto-detected LAN IP (useful if machine has multiple interfaces) |
+---
+## OTA Server
+The OTA server (`ota-server.js`) is a small HTTP server the ESP32 calls on boot to get its config — WebSocket URL, auth token, and timezone.
+```bash
+# Basic
+GATEWAY_TOKEN=<token> node ota-server.js
+# With overrides
+MAC_IP=192.168.1.50 VOICE_PORT=8765 OTA_PORT=8080 GATEWAY_TOKEN=<token> node ota-server.js
+```
+| Env var | Default | Description |
+|---------|---------|-------------|
+| `GATEWAY_TOKEN` | — | **Required.** Your OpenClaw gateway token |
+| `MAC_IP` | auto-detected | Your machine's LAN IP |
+| `VOICE_PORT` | `8765` | Voice WebSocket port |
+| `OTA_PORT` | `8080` | OTA server port |
+| `TZ_OFFSET` | system timezone | Minutes east of UTC (e.g. `330` for IST) |
+---
+## Gateway HTTP Endpoints
+The plugin also registers utility endpoints on the OpenClaw Gateway port (18789):
+| Endpoint | Description |
+|----------|-------------|
+| `GET /__openclaw__/esp32-voice/health` | Health check — shows configured STT/TTS status |
+| `GET /__openclaw__/esp32-voice/otp` | Generate a one-time device pairing code |
+| `GET /__openclaw__/esp32-voice/devices` | List currently paired devices |
+| `GET /__openclaw__/esp32-voice/stream` | Info about the voice WebSocket URL |
+---
+## Troubleshooting
+**ESP32 shows "connecting" but never "listening"**
+- Check the OTA server is running and the ESP32 fetched its config (watch OTA server logs)
+- Make sure firewall allows port 8765 inbound
+- Confirm `GATEWAY_TOKEN` matches `gateway.auth.token` in `~/.openclaw/openclaw.json`
+**Response is always "HEARTBEAT_OK"**
+- This was a known bug — fixed in v2026.2.21. Update to latest.
+**Dashboard pairing fails**
+- Make sure you paste only the short token (e.g. `XK9-2M4`), not the full command string
+- The token expires after 10 minutes — generate a new one from the dashboard if needed
+- Confirm the backend API is reachable: `curl http://64.227.170.31:8002/toy/health`
+**No audio from ESP32 speaker**
+- Plugin outputs 24kHz 16-bit mono Opus at 60ms frames — confirm firmware matches
+- Check ElevenLabs key is valid and has quota remaining
+**STT timeout / empty transcript**
+- Validate Deepgram key: `curl https://api.deepgram.com/v1/auth -H "Authorization: Token YOUR_KEY"`
+- Check the ESP32 is actually sending audio (hold button while speaking)
+**"device signature invalid" in gateway logs**
+- Device identity at `~/.openclaw/identity/device.json` may be missing
+- Run `openclaw onboard` to regenerate it
+---
+## Supported Hardware
+Tested with:
+- **Jiuchuan S3** (XiaoZhi ESP32-S3 board) — recommended
+- Any ESP32 board running [XiaoZhi firmware](https://github.com/78/xiaozhi-esp32)
+---
+## License
+MIT — Published under [@cheeko-ai](https://www.npmjs.com/org/cheeko-ai) on npm.