@cheeko-ai/esp32-voice 2026.2.21

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/TODO.md ADDED
@@ -0,0 +1,418 @@
1
+ # ESP32-Voice Plugin — Remaining Work TODO
2
+
3
+ > This document tracks everything that needs to be done before the plugin is production-ready
4
+ > and publishable to npm. Work through each section top-to-bottom. Each item is self-contained
5
+ > with enough context so a fresh contributor can pick it up without prior knowledge of the session.
6
+
7
+ ---
8
+
9
+ ## 🔴 SECTION 1 — Security (Do This Before Anything Else)
10
+
11
+ ### 1.1 — Delete or sanitize `SETUP.md`
12
+
13
+ **Why:** `SETUP.md` contains real, live API keys committed to the repository. Anyone who clones
14
+ the repo has these keys.
15
+
16
+ **Steps:**
17
+ 1. Open `extensions/esp32-voice/SETUP.md`
18
+ 2. Replace every real credential with a placeholder:
19
+ - `GEMINI_API_KEY=AIzaSy...` → `GEMINI_API_KEY=<YOUR_GEMINI_API_KEY>`
20
+ - `ELEVENLABS_API_KEY=sk_...` → `ELEVENLABS_API_KEY=<YOUR_ELEVENLABS_API_KEY>`
21
+ - Any token or secret string → `<YOUR_TOKEN_HERE>`
22
+ 3. Add a note at the top: `> **Note:** Replace all `<PLACEHOLDER>` values with your own credentials.`
23
+ 4. Immediately rotate the exposed keys:
24
+ - Go to [ElevenLabs API keys](https://elevenlabs.io/app/settings/api-keys) → delete old key → create new
25
+ - Go to [Google AI Studio](https://aistudio.google.com/apikey) → delete old key → create new
26
+ 5. Commit: `"security: remove exposed credentials from SETUP.md"`
27
+
28
+ ---
29
+
30
+ ### 1.2 — Remove hardcoded fallback token from `ota-server.js`
31
+
32
+ **Why:** The gateway token in `ota-server.js` (line ~59) has a hardcoded default:
33
+ ```js
34
+ const GATEWAY_TOKEN = process.env.GATEWAY_TOKEN || "YOUR_GATEWAY_TOKEN_HERE";
35
+ ```
36
+ This token is now public. If the env var is not set, the server should exit with a clear error,
37
+ not fall back to a known value.
38
+
39
+ **Steps:**
40
+ 1. Open `extensions/esp32-voice/ota-server.js`
41
+ 2. Find the `GATEWAY_TOKEN` line
42
+ 3. Replace with:
43
+ ```js
44
+ const GATEWAY_TOKEN = process.env.GATEWAY_TOKEN;
45
+ if (!GATEWAY_TOKEN) {
46
+ console.error("[ota-server] ERROR: GATEWAY_TOKEN env var is required. Set it in your .env file.");
47
+ process.exit(1);
48
+ }
49
+ ```
50
+ 4. Update the README/QUICKSTART to mention this env var is required
51
+ 5. Commit: `"security: require GATEWAY_TOKEN env var in ota-server, remove hardcoded fallback"`
52
+
53
+ ---
54
+
55
+ ## 🔴 SECTION 2 — Package.json Cleanup (Required for npm publish)
56
+
57
+ ### 2.1 — Remove unused `@discordjs/opus` dependency
58
+
59
+ **Why:** During development, `@discordjs/opus` was replaced with `opusscript` because macOS
60
+ Gatekeeper rejects its prebuilt native binary. The `@discordjs/opus` package is still listed in
61
+ `package.json` but is never imported anywhere in the source code.
62
+
63
+ **Steps:**
64
+ 1. Verify it's unused:
65
+ ```bash
66
+ grep -r "@discordjs/opus\|require.*opus\|import.*opus" extensions/esp32-voice/src/
67
+ # Should only find opusscript references, not @discordjs/opus
68
+ ```
69
+ 2. Remove it:
70
+ ```bash
71
+ cd extensions/esp32-voice
72
+ npm uninstall @discordjs/opus
73
+ ```
74
+ 3. Verify `opusscript` is still in `dependencies` in `package.json`
75
+ 4. Commit: `"chore: remove unused @discordjs/opus dependency, use opusscript only"`
76
+
77
+ **Note on opusscript:** `opusscript` is pure JavaScript/WebAssembly — it needs NO native
78
+ compilation, NO node-gyp, NO system libraries. It installs cleanly on macOS, Linux and Windows.
79
+ No pre/post-install scripts are needed.
80
+
81
+ ---
82
+
83
+ ### 2.2 — Update version to CalVer format
84
+
85
+ **Why:** All OpenClaw extensions use CalVer (`YYYY.M.D` e.g. `2026.2.21`). This package
86
+ has `"version": "1.0.0"` which is inconsistent.
87
+
88
+ **Steps:**
89
+ 1. Open `extensions/esp32-voice/package.json`
90
+ 2. Change `"version": "1.0.0"` to today's date in CalVer format, e.g. `"version": "2026.2.21"`
91
+ 3. Commit: `"chore: align version to CalVer format"`
92
+
93
+ ---
94
+
95
+ ### 2.3 — Add `ota-server.js` to the `files` array
96
+
97
+ **Why:** `ota-server.js` is not listed in the `files` array in `package.json`. When the package
98
+ is published to npm, this file will be excluded and users won't have the OTA server.
99
+
100
+ **Steps:**
101
+ 1. Open `extensions/esp32-voice/package.json`
102
+ 2. Find the `files` array (currently: `["index.ts", "src/", "openclaw.plugin.json", "README.md"]`)
103
+ 3. Add `"ota-server.js"` and `"TODO.md"` to the array:
104
+ ```json
105
+ "files": [
106
+ "index.ts",
107
+ "src/",
108
+ "openclaw.plugin.json",
109
+ "README.md",
110
+ "ota-server.js",
111
+ "TODO.md"
112
+ ]
113
+ ```
114
+ 4. Commit: `"chore: add ota-server.js and TODO.md to npm files array"`
115
+
116
+ ---
117
+
118
+ ## 🟡 SECTION 3 — OTA Server Integration (High Priority UX)
119
+
120
+ ### 3.1 — Fix hardcoded timezone (IST) in `ota-server.js`
121
+
122
+ **Why:** The OTA server response hardcodes the timezone offset to IST (UTC+5:30 = 330 minutes).
123
+ Users in other timezones will get wrong device time on their ESP32.
124
+
125
+ **Current code (in `ota-server.js`):**
126
+ ```js
127
+ timezone_offset: 330 // hardcoded IST — wrong for everyone else
128
+ ```
129
+
130
+ **Steps:**
131
+ 1. Open `extensions/esp32-voice/ota-server.js`
132
+ 2. Find the `timezone_offset` line
133
+ 3. Replace with a dynamic calculation:
134
+ ```js
135
+ // Get local UTC offset in minutes (negative for west, positive for east)
136
+ timezone_offset: -new Date().getTimezoneOffset(),
137
+ ```
138
+ > `getTimezoneOffset()` returns minutes west of UTC (negative for east), so negate it to get
139
+ > the standard "minutes east of UTC" that XiaoZhi firmware expects.
140
+
141
+ 4. Commit: `"fix: use system timezone instead of hardcoded IST in ota-server"`
142
+
143
+ ---
144
+
145
+ ### 3.2 — Integrate OTA endpoint into the plugin HTTP handler
146
+
147
+ **Why:** Currently users must run `node ota-server.js` as a completely separate process in a
148
+ separate terminal. This is confusing and adds friction. The plugin already registers an HTTP
149
+ handler (`src/http-handler.ts`). Adding the OTA route there means users just start OpenClaw
150
+ normally — no second process.
151
+
152
+ **Steps:**
153
+ 1. Open `extensions/esp32-voice/src/http-handler.ts`
154
+ 2. Identify where HTTP routes are registered (look for `app.get(...)` or similar)
155
+ 3. Add a route for `/xiaozhi/ota/` or `/__openclaw__/esp32-voice/ota/` that returns the same
156
+ JSON payload currently in `ota-server.js`:
157
+ ```typescript
158
+ // The payload the XiaoZhi firmware expects
159
+ {
160
+ "websocket": {
161
+ "url": "ws://<LAN_IP>:<ESP32_VOICE_PORT>/"
162
+ },
163
+ "openclaw": {
164
+ "url": "ws://127.0.0.1:18789",
165
+ "token": "<GATEWAY_TOKEN>"
166
+ },
167
+ "timezone_offset": -new Date().getTimezoneOffset()
168
+ }
169
+ ```
170
+ 4. Get the LAN IP using the same interface detection logic already in `ota-server.js`
171
+ (prefer `en0`, `en1`, `eth0`, `wlan0`, `wlo1`)
172
+ 5. Once integrated, update `README.md` to say "OTA is served automatically at
173
+ `http://<your-ip>:18789/__openclaw__/esp32-voice/ota/`" — no separate server needed
174
+ 6. Keep `ota-server.js` as a standalone fallback option for users who run the plugin without
175
+ the full Gateway
176
+ 7. Commit: `"feat: integrate OTA endpoint into plugin HTTP handler"`
177
+
178
+ ---
179
+
180
+ ## 🟡 SECTION 4 — Developer Experience
181
+
182
+ ### 4.1 — Write proper `README.md` with 3-step quick setup
183
+
184
+ **Why:** The current `README.md` is detailed but scattered. A new user needs a clear "from zero
185
+ to voice in 10 minutes" path at the very top, with details below.
186
+
187
+ **Suggested structure:**
188
+
189
+ ```
190
+ # @openclaw/esp32-voice
191
+
192
+ [One-line description]
193
+
194
+ ## Quick Start (10 minutes)
195
+
196
+ ### Step 1 — Get API Keys
197
+ - Deepgram (free): https://console.deepgram.com → API Keys → Create Key
198
+ - ElevenLabs (free tier): https://elevenlabs.io → Profile → API Keys
199
+
200
+ ### Step 2 — Configure OpenClaw
201
+ [exact env vars and openclaw.json snippet]
202
+
203
+ ### Step 3 — Flash your ESP32
204
+ [exact OTA URL to point firmware at]
205
+
206
+ ## Configuration Reference
207
+ [full table of all options]
208
+
209
+ ## How It Works
210
+ [architecture diagram]
211
+
212
+ ## Troubleshooting
213
+ [common errors and fixes]
214
+ ```
215
+
216
+ **Steps:**
217
+ 1. Open `extensions/esp32-voice/README.md`
218
+ 2. Add the "Quick Start" section as the very first content after the title
219
+ 3. Link to the Deepgram free tier and ElevenLabs free tier pages explicitly
220
+ 4. Show the minimum `~/.openclaw/openclaw.json` block (copy from QUICKSTART.md)
221
+ 5. Commit: `"docs: rewrite README with 3-step quick start at top"`
222
+
223
+ ---
224
+
225
+ ### 4.2 — Add `.env.example` file
226
+
227
+ **Why:** Users need to know what env vars to set. A `.env.example` file is the standard way
228
+ to document this without committing real credentials.
229
+
230
+ **Create `extensions/esp32-voice/.env.example`:**
231
+ ```bash
232
+ # Required — get free API key at https://console.deepgram.com
233
+ DEEPGRAM_API_KEY=<your-deepgram-api-key>
234
+
235
+ # Required — get free API key at https://elevenlabs.io
236
+ ELEVENLABS_API_KEY=<your-elevenlabs-api-key>
237
+
238
+ # Optional — find voice IDs at https://elevenlabs.io/voice-library
239
+ ELEVENLABS_VOICE_ID=21m00Tcm4TlvDq8ikWAM
240
+
241
+ # Optional — override default ElevenLabs model
242
+ ELEVENLABS_MODEL_ID=eleven_turbo_v2_5
243
+
244
+ # Optional — override default Deepgram model
245
+ DEEPGRAM_MODEL=nova-2
246
+
247
+ # Optional — port for ESP32 voice WebSocket server (default: 8765)
248
+ ESP32_VOICE_PORT=8765
249
+
250
+ # Required for OTA server — copy from ~/.openclaw/openclaw.json or your gateway setup
251
+ GATEWAY_TOKEN=<your-openclaw-gateway-token>
252
+
253
+ # Optional — override OpenClaw gateway URL (default: ws://127.0.0.1:18789)
254
+ OPENCLAW_GATEWAY_URL=ws://127.0.0.1:18789
255
+ ```
256
+
257
+ **Steps:**
258
+ 1. Create the file at `extensions/esp32-voice/.env.example` with the content above
259
+ 2. Make sure `.env.example` is in the `files` array in `package.json`
260
+ 3. Add a note in README: "Copy `.env.example` to `~/.openclaw/.env` and fill in your keys"
261
+ 4. Commit: `"docs: add .env.example with all required and optional env vars"`
262
+
263
+ ---
264
+
265
+ ## 🟡 SECTION 5 — Reliability Improvements
266
+
267
+ ### 5.1 — Persist OTP pairing across restarts
268
+
269
+ **Why:** The OTP pairing system stores approved devices in memory only. If OpenClaw restarts,
270
+ all paired devices need to be re-paired. This is annoying for users with always-on ESP32 devices.
271
+
272
+ **Current code in `src/device/device-otp.ts`:**
273
+ ```typescript
274
+ private pairedDevices: Map<string, PairedDevice> = new Map(); // in-memory only
275
+ ```
276
+
277
+ **Steps:**
278
+ 1. Open `extensions/esp32-voice/src/device/device-otp.ts`
279
+ 2. Add persistence using a JSON file at `~/.openclaw/esp32-voice-devices.json`
280
+ 3. On `DeviceOtpManager` construction, load the file if it exists
281
+ 4. On successful pairing, write the updated map back to the file
282
+ 5. On OpenClaw restart, previously paired devices are immediately trusted without re-pairing
283
+ 6. Commit: `"feat: persist paired devices to disk so pairing survives gateway restarts"`
284
+
285
+ ---
286
+
287
+ ### 5.2 — Add rate limiting to OTP verification
288
+
289
+ **Why:** The OTP is 6 digits (100,000 possible values). Without rate limiting, an attacker
290
+ on the same network could brute-force the OTP in minutes. The OTA HTTP endpoint and the
291
+ voice WebSocket both need protection.
292
+
293
+ **Steps:**
294
+ 1. Open `extensions/esp32-voice/src/device/device-otp.ts`
295
+ 2. Add a failed-attempt counter per source IP
296
+ 3. After 5 failed attempts from the same IP, block for 15 minutes
297
+ 4. Log blocked attempts at `warn` level
298
+ 5. Commit: `"security: add rate limiting to OTP verification (5 attempts then 15min block)"`
299
+
300
+ ---
301
+
302
+ ### 5.3 — Handle Gateway reconnection gracefully
303
+
304
+ **Why:** Currently if the OpenClaw Gateway drops the WebSocket connection (restart, timeout,
305
+ network hiccup), the plugin's `openclawConnected` flag goes false and stays false until the
306
+ ESP32 session is restarted. The Gateway reconnect should happen automatically in the background.
307
+
308
+ **Steps:**
309
+ 1. Open `extensions/esp32-voice/src/voice/voice-session.ts`
310
+ 2. In the `openclawWs.on("close", ...)` handler, instead of just setting `openclawConnected = false`,
311
+ schedule a reconnect after 3 seconds:
312
+ ```typescript
313
+ this.openclawWs.on("close", () => {
314
+ this.openclawConnected = false;
315
+ this.log("info", "OpenClaw disconnected — reconnecting in 3s...");
316
+ setTimeout(() => {
317
+ this.connectToOpenClaw().catch((err) =>
318
+ this.log("error", `Reconnect failed: ${err}`)
319
+ );
320
+ }, 3000);
321
+ });
322
+ ```
323
+ 3. Add an exponential backoff: 3s → 6s → 12s → 30s (cap at 30s)
324
+ 4. Stop retrying after session `cleanup()` is called
325
+ 5. Commit: `"feat: auto-reconnect to OpenClaw Gateway on disconnect"`
326
+
327
+ ---
328
+
329
+ ## 🟢 SECTION 6 — Publishing to npm
330
+
331
+ ### 6.1 — Final pre-publish checklist
332
+
333
+ Run through this checklist in order before running `npm publish`:
334
+
335
+ - [ ] All items in Section 1 (Security) are done
336
+ - [ ] All items in Section 2 (package.json) are done
337
+ - [ ] `SETUP.md` has no real credentials
338
+ - [ ] `ota-server.js` has no hardcoded tokens
339
+ - [ ] `@discordjs/opus` is removed from `package.json`
340
+ - [ ] Version is CalVer format (e.g. `2026.2.21`)
341
+ - [ ] `ota-server.js` is in the `files` array
342
+ - [ ] `.env.example` is in the `files` array
343
+ - [ ] `README.md` has a clear Quick Start section at the top
344
+ - [ ] Run `npm pack --dry-run` and check the file list — no `node_modules/`, no `.env`, no real credentials
345
+ - [ ] Test install in a clean directory: `mkdir /tmp/test-install && cd /tmp/test-install && npm install @openclaw/esp32-voice`
346
+
347
+ ### 6.2 — Publish
348
+
349
+ ```bash
350
+ cd extensions/esp32-voice
351
+ npm login # login to npm with your account
352
+ npm publish --access public
353
+ ```
354
+
355
+ After publishing, verify:
356
+ ```bash
357
+ npm info @openclaw/esp32-voice
358
+ ```
359
+
360
+ ---
361
+
362
+ ## 🟢 SECTION 7 — Future Enhancements (Post-Launch)
363
+
364
+ These are not blockers but would significantly improve the plugin:
365
+
366
+ | # | Enhancement | Effort | Impact |
367
+ |---|---|---|---|
368
+ | 7.1 | Add Google STT provider (`src/stt/google.ts`) | Medium | High — alternative to Deepgram |
369
+ | 7.2 | Add OpenAI Whisper STT provider | Medium | High — popular, good accuracy |
370
+ | 7.3 | Add Azure TTS provider | Medium | Medium — enterprise users |
371
+ | 7.4 | Add support for multiple simultaneous ESP32 devices per account | Medium | High |
372
+ | 7.5 | Add WebRTC transport option (lower latency than WebSocket+Opus) | High | Medium |
373
+ | 7.6 | Streaming TTS to ESP32 before full LLM response is ready | High | High — reduces perceived latency |
374
+ | 7.7 | Wake-word detection passthrough from ESP32 | Medium | Medium |
375
+ | 7.8 | Add unit tests for STT/TTS registry, frame pacing, JSON-in-binary detection | Medium | High |
376
+ | 7.9 | CI/CD pipeline for the extension (GitHub Actions) | Low | Medium |
377
+ | 7.10 | Support for Zalo/Line/Telegram as voice backends (not just OpenClaw main session) | High | Medium |
378
+
379
+ ---
380
+
381
+ ## Quick Reference — Architecture
382
+
383
+ ```
384
+ ESP32 (XiaoZhi firmware)
385
+
386
+ │ WebSocket ws://<your-ip>:8765/
387
+
388
+ [esp32-voice plugin — port 8765]
389
+ │ STT: Opus frames → Deepgram → transcript
390
+ │ LLM: transcript → OpenClaw Gateway → response text
391
+ │ TTS: response text → ElevenLabs → PCM → Opus frames
392
+
393
+ │ WebSocket ws://127.0.0.1:18789
394
+
395
+ [OpenClaw Gateway — port 18789]
396
+
397
+
398
+ [AI Model — Gemini / Claude / GPT]
399
+ ```
400
+
401
+ **Key files:**
402
+ - `src/voice/voice-session.ts` — main pipeline orchestrator (STT → LLM → TTS)
403
+ - `src/voice/voice-endpoint.ts` — standalone WebSocket server on port 8765
404
+ - `src/stt/deepgram.ts` — Deepgram STT (VAD + streaming)
405
+ - `src/tts/elevenlabs.ts` — ElevenLabs TTS (serialized audio chain)
406
+ - `src/device/device-otp.ts` — OTP pairing system
407
+ - `ota-server.js` — standalone OTA config server for XiaoZhi firmware
408
+ - `index.ts` — OpenClaw plugin entry point
409
+
410
+ **Known quirks solved (do not revert):**
411
+ - XiaoZhi sends ALL WebSocket messages as binary frames — even JSON. Detection: check if binary frame starts with `0x7b` (`{`) before treating as audio.
412
+ - `@discordjs/opus` prebuilt binaries are rejected by macOS Gatekeeper. Use `opusscript` (pure WASM) instead.
413
+ - ElevenLabs `onAudio` callback must be chained (not fire-and-forget) so Opus frame pacing is respected and sentences play sequentially, not simultaneously.
414
+ - Frame pacing anchor: `nextFrameAt` must be set at the moment the **first** frame is sent, not at function entry — otherwise TTS connection time is counted as debt and early frames are sent with no delay.
415
+
416
+ ---
417
+
418
+ *Last updated: 2026-02-21 | Plugin version: 1.0.0 (pre-release)*
package/index.ts ADDED
@@ -0,0 +1,128 @@
1
+ import type { OpenClawPluginApi } from "openclaw/plugin-sdk";
2
+ import { emptyPluginConfigSchema } from "openclaw/plugin-sdk";
3
+ import { esp32VoicePlugin } from "./src/channel.js";
4
+ import { setEsp32VoiceRuntime } from "./src/runtime.js";
5
+ import { startStandaloneVoiceServer } from "./src/voice/voice-endpoint.js";
6
+
7
+ // Import STT/TTS providers to trigger auto-registration with the registries
8
+ import "./src/stt/deepgram.js";
9
+ import "./src/tts/elevenlabs.js";
10
+
11
+ const VOICE_PORT = parseInt(process.env.ESP32_VOICE_PORT ?? "8765", 10);
12
+
13
+ const plugin = {
14
+ id: "esp32-voice",
15
+ name: "ESP32 Voice",
16
+ description:
17
+ "ESP32 Voice device channel — voice-to-text-to-voice with pluggable STT/TTS providers",
18
+ configSchema: emptyPluginConfigSchema(),
19
+ register(api: OpenClawPluginApi) {
20
+ setEsp32VoiceRuntime(api.runtime);
21
+
22
+ // Register the ESP32 Voice channel
23
+ api.registerChannel({ plugin: esp32VoicePlugin });
24
+
25
+ // ── Gateway HTTP routes (non-WS utilities) ────────────────────
26
+ // These are registered now so they work once the gateway starts.
27
+ // The actual port is fixed at VOICE_PORT — routes reference it by closure.
28
+
29
+ // Info route (tells callers this path needs a WS connection on the voice port)
30
+ api.registerHttpRoute({
31
+ path: "/__openclaw__/esp32-voice/stream",
32
+ handler: (_req, res) => {
33
+ res.writeHead(200, { "Content-Type": "application/json" });
34
+ res.end(
35
+ JSON.stringify({
36
+ service: "esp32-voice",
37
+ type: "websocket",
38
+ hint: `Connect your ESP32 via WebSocket to ws://<your-ip>:${VOICE_PORT}/`,
39
+ voicePort: VOICE_PORT,
40
+ }),
41
+ );
42
+ },
43
+ });
44
+
45
+ // Health endpoint (via Gateway port for convenience)
46
+ api.registerHttpRoute({
47
+ path: "/__openclaw__/esp32-voice/health",
48
+ handler: (_req, res) => {
49
+ res.writeHead(200, { "Content-Type": "application/json" });
50
+ res.end(
51
+ JSON.stringify({
52
+ ok: true,
53
+ service: "esp32-voice",
54
+ voicePort: VOICE_PORT,
55
+ voiceWsUrl: `ws://<your-ip>:${VOICE_PORT}/`,
56
+ sttConfigured: Boolean(
57
+ process.env.DEEPGRAM_API_KEY ||
58
+ api.config?.channels?.esp32voice?.sttApiKey,
59
+ ),
60
+ ttsConfigured: Boolean(
61
+ process.env.ELEVENLABS_API_KEY ||
62
+ process.env.XI_API_KEY ||
63
+ api.config?.channels?.esp32voice?.ttsApiKey,
64
+ ),
65
+ }),
66
+ );
67
+ },
68
+ });
69
+
70
+ // OTP generation endpoint
71
+ api.registerHttpRoute({
72
+ path: "/__openclaw__/esp32-voice/otp",
73
+ handler: async (_req, res) => {
74
+ const { deviceOtpManager } = await import("./src/device/device-otp.js");
75
+ const code = deviceOtpManager.generateOtp();
76
+ res.writeHead(200, { "Content-Type": "application/json" });
77
+ res.end(JSON.stringify({ code, expiresInSeconds: 300 }));
78
+ },
79
+ });
80
+
81
+ // Paired devices listing
82
+ api.registerHttpRoute({
83
+ path: "/__openclaw__/esp32-voice/devices",
84
+ handler: async (_req, res) => {
85
+ const { deviceOtpManager } = await import("./src/device/device-otp.js");
86
+ const devices = deviceOtpManager.listPairedDevices();
87
+ res.writeHead(200, { "Content-Type": "application/json" });
88
+ res.end(JSON.stringify({ devices }));
89
+ },
90
+ });
91
+
92
+ // ── Standalone Voice WebSocket Server (gateway-only service) ──
93
+ //
94
+ // The OpenClaw Gateway plugin API does NOT support WebSocket upgrade
95
+ // registration — registerHttpRoute() only handles regular HTTP requests.
96
+ // When an ESP32 tries to upgrade to WebSocket on the Gateway port (18789),
97
+ // the Gateway's own upgrade handler intercepts it and routes it to the
98
+ // Gateway's internal WS server instead of this plugin.
99
+ //
100
+ // Solution: spin up a dedicated HTTP server on a separate port (8765).
101
+ // The ESP32 firmware connects directly to this server. No core changes needed.
102
+ //
103
+ // Registered as a SERVICE so it only starts when the gateway starts,
104
+ // NOT during CLI commands like `channels add` (which would conflict
105
+ // with any already-running gateway on the same port).
106
+ //
107
+ api.registerService({
108
+ id: "esp32-voice-server",
109
+ start: async () => {
110
+ const { port } = startStandaloneVoiceServer(VOICE_PORT);
111
+ console.log("[esp32voice] Plugin registered successfully");
112
+ console.log(`[esp32voice] Voice WebSocket (standalone): ws://0.0.0.0:${port}/`);
113
+ console.log(`[esp32voice] Point your ESP32 to: ws://<your-mac-ip>:${port}/`);
114
+ console.log(`[esp32voice] Health check (Gateway port): http://<gateway>/__openclaw__/esp32-voice/health`);
115
+ console.log(`[esp32voice] Generate OTP: http://<gateway>/__openclaw__/esp32-voice/otp`);
116
+ },
117
+ });
118
+ },
119
+ };
120
+
121
+ export default plugin;
122
+
123
+ // Exports for consumers / third-party provider plugins
124
+ export { startStandaloneVoiceServer };
125
+ export { sttRegistry } from "./src/stt/stt-registry.js";
126
+ export { ttsRegistry } from "./src/tts/tts-registry.js";
127
+ export type { SttProvider, SttProviderConfig, SttProviderMeta } from "./src/stt/stt-provider.js";
128
+ export type { TtsProvider, TtsProviderConfig, TtsProviderMeta } from "./src/tts/tts-provider.js";
@@ -0,0 +1,9 @@
1
+ {
2
+ "id": "esp32-voice",
3
+ "channels": ["esp32voice"],
4
+ "configSchema": {
5
+ "type": "object",
6
+ "additionalProperties": false,
7
+ "properties": {}
8
+ }
9
+ }
package/package.json ADDED
@@ -0,0 +1,62 @@
1
+ {
2
+ "name": "@cheeko-ai/esp32-voice",
3
+ "version": "2026.2.21",
4
+ "private": false,
5
+ "description": "OpenClaw ESP32 Voice channel plugin — voice-to-text-to-voice device integration with pluggable STT/TTS providers",
6
+ "type": "module",
7
+ "license": "MIT",
8
+ "main": "index.ts",
9
+ "files": [
10
+ "index.ts",
11
+ "src/",
12
+ "openclaw.plugin.json",
13
+ "README.md",
14
+ "TODO.md",
15
+ "NPM_PUBLISH_READINESS.md"
16
+ ],
17
+ "keywords": [
18
+ "openclaw",
19
+ "esp32",
20
+ "voice",
21
+ "stt",
22
+ "tts",
23
+ "deepgram",
24
+ "elevenlabs",
25
+ "iot",
26
+ "speech-to-text",
27
+ "text-to-speech"
28
+ ],
29
+ "repository": {
30
+ "type": "git",
31
+ "url": "https://github.com/openclaw/openclaw",
32
+ "directory": "extensions/esp32-voice"
33
+ },
34
+ "dependencies": {
35
+ "opusscript": "^0.0.8",
36
+ "ws": "^8.18.0",
37
+ "zod": "^4.3.6"
38
+ },
39
+ "devDependencies": {
40
+ "@types/ws": "^8.5.12",
41
+ "openclaw": ">=2026.1.0"
42
+ },
43
+ "openclaw": {
44
+ "extensions": [
45
+ "./index.ts"
46
+ ],
47
+ "channel": {
48
+ "id": "esp32voice",
49
+ "label": "ESP32 Voice",
50
+ "selectionLabel": "ESP32 Voice (plugin)",
51
+ "docsPath": "/channels/esp32-voice",
52
+ "docsLabel": "esp32-voice",
53
+ "blurb": "ESP32 voice device channel — speech-to-text-to-speech with pluggable STT/TTS providers (Deepgram, ElevenLabs, and more).",
54
+ "order": 90
55
+ },
56
+ "install": {
57
+ "npmSpec": "@cheeko-ai/esp32-voice",
58
+ "localPath": "extensions/esp32-voice",
59
+ "defaultChoice": "npm"
60
+ }
61
+ }
62
+ }