@cheeko-ai/esp32-voice 2026.2.21

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,299 @@
1
+ # ESP32-Voice Plugin — npm Publishing Readiness Report
2
+
3
+ > Full analysis of what needs to change before this plugin can be published to npm
4
+ > and work out-of-the-box for other OpenClaw users.
5
+
6
+ ---
7
+
8
+ ## Current Status: NOT READY for npm publish
9
+
10
+ ---
11
+
12
+ ## 🔴 Section 1 — Security (Fix Before Anything Else)
13
+
14
+ ### 1.1 Real API keys exposed in `SETUP.md`
15
+
16
+ `SETUP.md` contains live, active credentials committed to the repository:
17
+
18
+ ```
19
+ GEMINI_API_KEY=YOUR_GEMINI_API_KEY_HERE
20
+ ELEVENLABS_API_KEY=YOUR_ELEVENLABS_API_KEY_HERE
21
+ ```
22
+
23
+ **Action required:**
24
+ 1. Rotate both keys immediately (they are now public):
25
+ - ElevenLabs: https://elevenlabs.io/app/settings/api-keys → delete → create new
26
+ - Google AI Studio: https://aistudio.google.com/apikey → delete → create new
27
+ 2. Replace all real values in `SETUP.md` with placeholders like `<YOUR_ELEVENLABS_API_KEY>`
28
+
29
+ ---
30
+
31
+ ### 1.2 Hardcoded fallback gateway token in `ota-server.js`
32
+
33
+ ```js
34
+ // Line ~59 in ota-server.js
35
+ const GATEWAY_TOKEN = process.env.GATEWAY_TOKEN || "YOUR_GATEWAY_TOKEN_HERE";
36
+ ```
37
+
38
+ This token is now public. Any user who forgets to set the env var silently uses this known token.
39
+
40
+ **Action required:**
41
+ Replace with a hard exit if the env var is not set:
42
+ ```js
43
+ const GATEWAY_TOKEN = process.env.GATEWAY_TOKEN;
44
+ if (!GATEWAY_TOKEN) {
45
+ console.error("[ota-server] ERROR: GATEWAY_TOKEN env var is required.");
46
+ process.exit(1);
47
+ }
48
+ ```
49
+
50
+ ---
51
+
52
+ ## 🔴 Section 2 — package.json Fixes (Required for npm publish)
53
+
54
+ ### 2.1 Remove unused `@discordjs/opus` dependency
55
+
56
+ `@discordjs/opus` is listed in `dependencies` but is **never imported anywhere** in the source.
57
+ It was replaced by `opusscript` after macOS Gatekeeper rejected its prebuilt native binary.
58
+ Leaving it in `dependencies` means every user downloads and tries to install a native binary
59
+ they don't need — and it will fail on many systems.
60
+
61
+ ```bash
62
+ cd extensions/esp32-voice
63
+ npm uninstall @discordjs/opus
64
+ ```
65
+
66
+ **Why opusscript needs no install script:**
67
+ `opusscript` is pure JavaScript/WebAssembly. It requires:
68
+ - No native compilation
69
+ - No `node-gyp`
70
+ - No system libraries (no `libopus` to install)
71
+ - No pre/post-install scripts
72
+
73
+ It installs cleanly on macOS, Linux, and Windows via a plain `npm install`. No extra steps for users.
74
+
75
+ ---
76
+
77
+ ### 2.2 Update version to CalVer
78
+
79
+ All OpenClaw extensions use CalVer format (`YYYY.M.D`). This package has `"version": "1.0.0"`.
80
+
81
+ Change to: `"version": "2026.2.21"` (or the date of first publish)
82
+
83
+ ---
84
+
85
+ ### 2.3 Add `ota-server.js` to the `files` array
86
+
87
+ `ota-server.js` is not in the `files` array so it will be excluded from the npm package.
88
+ Users will install the plugin but have no OTA server.
89
+
90
+ Current `files` array:
91
+ ```json
92
+ "files": ["index.ts", "src/", "openclaw.plugin.json", "README.md"]
93
+ ```
94
+
95
+ Should be:
96
+ ```json
97
+ "files": ["index.ts", "src/", "openclaw.plugin.json", "README.md", "ota-server.js", ".env.example"]
98
+ ```
99
+
100
+ ---
101
+
102
+ ### 2.4 Add `.env.example` file (new file to create)
103
+
104
+ Users need to know what env vars to set. Create `extensions/esp32-voice/.env.example`:
105
+
106
+ ```bash
107
+ # Required — get free key at https://console.deepgram.com
108
+ DEEPGRAM_API_KEY=<your-deepgram-api-key>
109
+
110
+ # Required — get free key at https://elevenlabs.io
111
+ ELEVENLABS_API_KEY=<your-elevenlabs-api-key>
112
+
113
+ # Optional — find voice IDs at https://elevenlabs.io/voice-library
114
+ # Default: Rachel (21m00Tcm4TlvDq8ikWAM)
115
+ ELEVENLABS_VOICE_ID=21m00Tcm4TlvDq8ikWAM
116
+
117
+ # Optional — default: eleven_turbo_v2_5
118
+ ELEVENLABS_MODEL_ID=eleven_turbo_v2_5
119
+
120
+ # Optional — default: nova-2
121
+ DEEPGRAM_MODEL=nova-2
122
+
123
+ # Optional — port for ESP32 WebSocket server (default: 8765)
124
+ ESP32_VOICE_PORT=8765
125
+
126
+ # Required for OTA server — copy from your openclaw.json or gateway setup
127
+ GATEWAY_TOKEN=<your-openclaw-gateway-token>
128
+
129
+ # Optional — default: ws://127.0.0.1:18789
130
+ OPENCLAW_GATEWAY_URL=ws://127.0.0.1:18789
131
+ ```
132
+
133
+ ---
134
+
135
+ ## 🟡 Section 3 — Setup Experience for New Users
136
+
137
+ ### What a new user has to do today (too many steps)
138
+
139
+ 1. Install OpenClaw and the plugin
140
+ 2. Sign up for Deepgram (STT) — get API key
141
+ 3. Sign up for ElevenLabs (TTS) — get API key
142
+ 4. Add keys to `~/.openclaw/.env`
143
+ 5. Edit `~/.openclaw/openclaw.json` with a device token (generate manually)
144
+ 6. Start OpenClaw Gateway in one terminal
145
+ 7. Start `ota-server.js` in a **second terminal**
146
+ 8. Find their machine's LAN IP address
147
+ 9. Flash ESP32 with the OTA URL
148
+ 10. Reboot ESP32 and hope auto-detect worked
149
+
150
+ **Pain point:** Two separate processes, manual IP lookup, unclear token generation.
151
+
152
+ ---
153
+
154
+ ### 3.1 Fix hardcoded timezone in `ota-server.js`
155
+
156
+ The OTA server hardcodes IST (UTC+5:30 = 330 minutes). Users in other timezones get wrong device time on their ESP32.
157
+
158
+ Current:
159
+ ```js
160
+ timezone_offset: 330 // hardcoded IST
161
+ ```
162
+
163
+ Fix:
164
+ ```js
165
+ timezone_offset: -new Date().getTimezoneOffset() // reads system timezone
166
+ ```
167
+
168
+ > `getTimezoneOffset()` returns minutes west of UTC (negative for east of UTC),
169
+ > so negating it gives the standard "minutes east of UTC" that XiaoZhi expects.
170
+
171
+ ---
172
+
173
+ ### 3.2 Integrate OTA endpoint into the plugin HTTP handler (removes second terminal)
174
+
175
+ Currently `ota-server.js` must be run as a separate process. The plugin already has an
176
+ HTTP handler (`src/http-handler.ts`). Adding the OTA route there means users only run
177
+ one command — `openclaw gateway` — and everything works.
178
+
179
+ The OTA route should be served at:
180
+ ```
181
+ http://<your-lan-ip>:18789/__openclaw__/esp32-voice/ota/
182
+ ```
183
+
184
+ Payload returned (same as ota-server.js currently returns):
185
+ ```json
186
+ {
187
+ "websocket": {
188
+ "url": "ws://<LAN_IP>:8765/"
189
+ },
190
+ "openclaw": {
191
+ "url": "ws://127.0.0.1:18789",
192
+ "token": "<GATEWAY_TOKEN>"
193
+ },
194
+ "timezone_offset": -330
195
+ }
196
+ ```
197
+
198
+ Once integrated, update `README.md` to say "OTA is automatically available — no second server needed."
199
+
200
+ ---
201
+
202
+ ### 3.3 Rewrite `README.md` top section as a 3-step Quick Start
203
+
204
+ The current README has good detail but buries the getting-started path. New users need:
205
+
206
+ ```
207
+ ## Quick Start
208
+
209
+ ### Step 1 — Get API keys (both have free tiers)
210
+ - Deepgram: https://console.deepgram.com → API Keys → Create
211
+ - ElevenLabs: https://elevenlabs.io → Profile → API Keys
212
+
213
+ ### Step 2 — Configure
214
+ cp .env.example ~/.openclaw/.env
215
+ # Edit ~/.openclaw/.env with your keys
216
+
217
+ ### Step 3 — Flash your ESP32
218
+ Point your XiaoZhi firmware OTA URL to:
219
+ http://<your-machine-lan-ip>:18789/__openclaw__/esp32-voice/ota/
220
+ Reboot the device. Done.
221
+ ```
222
+
223
+ ---
224
+
225
+ ## 🟡 Section 4 — Reliability Improvements
226
+
227
+ ### 4.1 Persist OTP pairing across Gateway restarts
228
+
229
+ Currently paired devices are stored in memory only. If the Gateway restarts, all ESP32
230
+ devices need to be re-paired with a new OTP. This is very annoying for always-on devices.
231
+
232
+ Fix: persist the paired device map to `~/.openclaw/esp32-voice-devices.json` on each
233
+ successful pairing and reload it on startup.
234
+
235
+ ---
236
+
237
+ ### 4.2 Add rate limiting to OTP verification
238
+
239
+ The OTP is 6 digits (100,000 possible values). Without rate limiting, someone on the
240
+ same LAN could brute-force it in minutes. Add a per-IP attempt counter: lock out for
241
+ 15 minutes after 5 failed attempts.
242
+
243
+ ---
244
+
245
+ ### 4.3 Auto-reconnect to Gateway on disconnect
246
+
247
+ If the OpenClaw Gateway drops the connection (restart, timeout, network blip),
248
+ `openclawConnected` goes false and stays false until the ESP32 session is restarted.
249
+ Should reconnect automatically with exponential backoff (3s → 6s → 12s → 30s cap).
250
+
251
+ ---
252
+
253
+ ## 🟢 Section 5 — Pre-publish Checklist
254
+
255
+ Run through this before `npm publish`:
256
+
257
+ - [ ] Rotated the exposed API keys (ElevenLabs + Gemini)
258
+ - [ ] `SETUP.md` has no real credentials — only `<PLACEHOLDER>` values
259
+ - [ ] Hardcoded fallback token removed from `ota-server.js`
260
+ - [ ] `@discordjs/opus` removed from `package.json` dependencies
261
+ - [ ] Version updated to CalVer (e.g. `2026.2.21`)
262
+ - [ ] `ota-server.js` added to `files` array in `package.json`
263
+ - [ ] `.env.example` created and added to `files` array
264
+ - [ ] Timezone fix applied in `ota-server.js`
265
+ - [ ] README has a 3-step Quick Start at the very top
266
+ - [ ] Ran `npm pack --dry-run` and verified:
267
+ - No `node_modules/` in the package
268
+ - No `.env` file in the package
269
+ - No real credentials in any file
270
+ - [ ] Test clean install: `mkdir /tmp/test && cd /tmp/test && npm install @openclaw/esp32-voice`
271
+
272
+ Then publish:
273
+ ```bash
274
+ cd extensions/esp32-voice
275
+ npm login
276
+ npm publish --access public
277
+ ```
278
+
279
+ ---
280
+
281
+ ## Summary Table
282
+
283
+ | # | Issue | Severity | Effort |
284
+ |---|---|---|---|
285
+ | 1.1 | Real API keys in SETUP.md | 🔴 Critical | 5 min |
286
+ | 1.2 | Hardcoded fallback token in ota-server.js | 🔴 Critical | 5 min |
287
+ | 2.1 | Remove unused @discordjs/opus | 🔴 Blocker | 2 min |
288
+ | 2.2 | Version to CalVer | 🔴 Blocker | 1 min |
289
+ | 2.3 | Add ota-server.js to files array | 🔴 Blocker | 2 min |
290
+ | 2.4 | Create .env.example | 🟡 Important | 10 min |
291
+ | 3.1 | Fix hardcoded IST timezone | 🟡 Important | 5 min |
292
+ | 3.2 | Integrate OTA into plugin HTTP handler | 🟡 Important | 2–3 hours |
293
+ | 3.3 | Rewrite README Quick Start | 🟡 Important | 30 min |
294
+ | 4.1 | Persist OTP pairing to disk | 🟡 Nice to have | 1 hour |
295
+ | 4.2 | Rate limit OTP attempts | 🟡 Nice to have | 1 hour |
296
+ | 4.3 | Auto-reconnect to Gateway | 🟡 Nice to have | 1 hour |
297
+
298
+ **Minimum to publish safely: items 1.1, 1.2, 2.1, 2.2, 2.3**
299
+ **Minimum for a good user experience: all of Section 2 + Section 3**
package/README.md ADDED
@@ -0,0 +1,226 @@
1
+ # 🎤 ESP32 Voice — OpenClaw Extension
2
+
3
+ Turn a XiaoZhi ESP32 board into a voice AI assistant powered by OpenClaw.
4
+ Push to talk → speak → get a spoken response. That's it.
5
+
6
+ ---
7
+
8
+ ## Quick Start
9
+
10
+ ### Step 1 — Install the plugin
11
+
12
+ ```bash
13
+ openclaw plugins install @openclaw/esp32-voice
14
+ ```
15
+
16
+ When prompted by the OpenClaw setup wizard, enter:
17
+
18
+ | Prompt | What to enter |
19
+ |---|---|
20
+ | Deepgram API key | Get free at [console.deepgram.com](https://console.deepgram.com) → API Keys → Create |
21
+ | ElevenLabs API key | Get free at [elevenlabs.io](https://elevenlabs.io) → Profile → API Keys |
22
+ | ElevenLabs Voice ID | Leave blank for default (Rachel), or pick from [voice library](https://elevenlabs.io/voice-library) |
23
+
24
+ That's all the configuration needed. Keys are saved to `~/.openclaw/.env` automatically.
25
+
26
+ ---
27
+
28
+ ### Step 2 — Start the Gateway and OTA server
29
+
30
+ ```bash
31
+ # Terminal 1 — OpenClaw Gateway (AI brain)
32
+ openclaw gateway
33
+
34
+ # Terminal 2 — OTA config server (tells your ESP32 where to connect)
35
+ GATEWAY_TOKEN=<your-gateway-token> node $(openclaw plugins path @openclaw/esp32-voice)/ota-server.js
36
+ ```
37
+
38
+ The OTA server will print your connection URL:
39
+
40
+ ```
41
+ 🦞 ESP32 OTA Mock Server
42
+ Auto-detected MAC IP : 192.168.1.10
43
+ OTA Server : http://192.168.1.10:8080/xiaozhi/ota/
44
+ Voice WebSocket : ws://192.168.1.10:8765/
45
+ ```
46
+
47
+ > **Your gateway token** is in `~/.openclaw/openclaw.json` under `gateway.token`,
48
+ > or shown when you run `openclaw gateway --show-token`.
49
+
50
+ ---
51
+
52
+ ### Step 3 — Flash your ESP32
53
+
54
+ In your XiaoZhi firmware settings, set the OTA URL to what the server printed:
55
+
56
+ ```
57
+ http://192.168.1.10:8080/xiaozhi/ota/
58
+ ```
59
+
60
+ Reboot the device. It fetches its config automatically, connects, and is ready to use.
61
+ **Hold the button → speak → release → hear the response.**
62
+
63
+ ---
64
+
65
+ ## How It Works
66
+
67
+ ```
68
+ ESP32 (XiaoZhi firmware)
69
+ │ Opus audio frames (WebSocket, port 8765)
70
+
71
+ [esp32-voice plugin]
72
+ │ STT: Deepgram → transcript text
73
+ │ LLM: OpenClaw Gateway (port 18789) → response text
74
+ │ TTS: ElevenLabs → Opus audio frames
75
+
76
+ ESP32 speaker
77
+ ```
78
+
79
+ The plugin runs its own WebSocket server on port **8765** (separate from the Gateway).
80
+ The ESP32 connects directly to this server — no Gateway changes required.
81
+
82
+ ---
83
+
84
+ ## Configuration Reference
85
+
86
+ All config goes in `~/.openclaw/openclaw.json` under `channels.esp32voice`.
87
+ Keys can also be set in `~/.openclaw/.env`.
88
+
89
+ ### Single device
90
+
91
+ ```json5
92
+ {
93
+ channels: {
94
+ esp32voice: {
95
+ enabled: true,
96
+ sttApiKey: "your-deepgram-key", // or set DEEPGRAM_API_KEY in .env
97
+ ttsApiKey: "your-elevenlabs-key", // or set ELEVENLABS_API_KEY in .env
98
+ ttsVoiceId: "21m00Tcm4TlvDq8ikWAM", // optional, defaults to Rachel
99
+ language: "en",
100
+ maxResponseLength: 500,
101
+ voiceOptimized: true,
102
+ },
103
+ },
104
+ }
105
+ ```
106
+
107
+ ### Multiple devices
108
+
109
+ ```json5
110
+ {
111
+ channels: {
112
+ esp32voice: {
113
+ enabled: true,
114
+ accounts: {
115
+ office: {
116
+ name: "Office Assistant",
117
+ sttApiKey: "your-deepgram-key",
118
+ ttsApiKey: "your-elevenlabs-key",
119
+ language: "en",
120
+ },
121
+ bedroom: {
122
+ name: "Bedroom Assistant",
123
+ sttApiKey: "your-deepgram-key",
124
+ ttsApiKey: "your-elevenlabs-key",
125
+ language: "en",
126
+ maxResponseLength: 300,
127
+ },
128
+ },
129
+ },
130
+ },
131
+ }
132
+ ```
133
+
134
+ ### All options
135
+
136
+ | Key | Type | Default | Description |
137
+ |---|---|---|---|
138
+ | `enabled` | boolean | `true` | Enable/disable the channel |
139
+ | `sttProvider` | string | `"deepgram"` | STT provider ID |
140
+ | `sttApiKey` | string | — | Deepgram API key |
141
+ | `sttModel` | string | `"nova-2"` | Deepgram model |
142
+ | `ttsProvider` | string | `"elevenlabs"` | TTS provider ID |
143
+ | `ttsApiKey` | string | — | ElevenLabs API key |
144
+ | `ttsVoiceId` | string | Rachel | ElevenLabs voice ID |
145
+ | `ttsModel` | string | `"eleven_turbo_v2_5"` | ElevenLabs model |
146
+ | `language` | string | `"en"` | Language code (ISO 639-1) |
147
+ | `maxResponseLength` | number | `500` | Max response chars (keep short for voice) |
148
+ | `voiceOptimized` | boolean | `true` | Tells the AI to respond concisely without markdown |
149
+
150
+ ### Environment variables
151
+
152
+ | Variable | Description |
153
+ |---|---|
154
+ | `DEEPGRAM_API_KEY` | Deepgram STT API key |
155
+ | `ELEVENLABS_API_KEY` | ElevenLabs TTS API key |
156
+ | `ELEVENLABS_VOICE_ID` | ElevenLabs voice ID (optional) |
157
+ | `ELEVENLABS_MODEL_ID` | ElevenLabs model (optional) |
158
+ | `DEEPGRAM_MODEL` | Deepgram model (optional) |
159
+ | `ESP32_VOICE_PORT` | Voice WebSocket server port (default: `8765`) |
160
+ | `GATEWAY_TOKEN` | Required for OTA server |
161
+ | `CHEEKO_PAIR` | One-time pairing token from Cheeko dashboard (optional — enables auto-registration) |
162
+ | `CHEEKO_DASHBOARD_URL` | Cheeko dashboard UI URL shown in browser (default: `http://64.227.170.31:8001`) |
163
+ | `CHEEKO_API_URL` | Cheeko backend API base URL (default: `http://64.227.170.31:8002/toy`) |
164
+ | `MAC_IP` | Override auto-detected LAN IP for Cheeko pairing |
165
+
166
+ ---
167
+
168
+ ## OTA Server
169
+
170
+ The OTA server (`ota-server.js`) is a small HTTP server that your ESP32 calls on boot
171
+ to get its configuration automatically — WebSocket URL, auth token, and timezone.
172
+
173
+ ```bash
174
+ # Basic usage
175
+ GATEWAY_TOKEN=<token> node ota-server.js
176
+
177
+ # With overrides
178
+ MAC_IP=192.168.1.50 VOICE_PORT=8765 OTA_PORT=8080 GATEWAY_TOKEN=<token> node ota-server.js
179
+ ```
180
+
181
+ | Env var | Default | Description |
182
+ |---|---|---|
183
+ | `GATEWAY_TOKEN` | — | **Required.** Your OpenClaw gateway token |
184
+ | `MAC_IP` | auto-detected | Your machine's LAN IP (override if auto-detect is wrong) |
185
+ | `VOICE_PORT` | `8765` | Port the ESP32 voice WebSocket runs on |
186
+ | `OTA_PORT` | `8080` | Port the OTA server listens on |
187
+ | `TZ_OFFSET` | system timezone | Minutes east of UTC (e.g. `330` for IST, `0` for UTC) |
188
+
189
+ ---
190
+
191
+ ## Troubleshooting
192
+
193
+ **ESP32 shows "connecting" but never "listening"**
194
+ - Check that the OTA server is running and the ESP32 fetched its config (watch OTA server logs)
195
+ - Make sure your machine's firewall allows port 8765
196
+ - Confirm `GATEWAY_TOKEN` matches the token in your `~/.openclaw/openclaw.json`
197
+
198
+ **"device signature invalid" error in gateway logs**
199
+ - The device identity file at `~/.openclaw/identity/device.json` may be missing or corrupt
200
+ - Run `openclaw onboard` to regenerate it
201
+
202
+ **Audio plays but all sentences overlap / sound garbled**
203
+ - This is a known firmware quirk on some XiaoZhi builds — update to the latest firmware
204
+ - The plugin sends sentences sequentially with proper pacing — the issue is on the device side
205
+
206
+ **No audio from ESP32 speaker**
207
+ - Confirm `output_format=pcm_24000` — the plugin outputs 24kHz 16-bit mono PCM encoded as Opus
208
+ - Frame duration is 60ms — if your firmware expects a different duration, set `OUTPUT_FRAME_MS` in `voice-session.ts`
209
+
210
+ **STT timeout / empty transcript**
211
+ - Check Deepgram API key is valid: `curl https://api.deepgram.com/v1/auth -H "Authorization: Token YOUR_KEY"`
212
+ - Increase endpointing if speech is being cut off too early (edit `deepgram.ts` `endpointing` param)
213
+
214
+ ---
215
+
216
+ ## Supported Hardware
217
+
218
+ Tested with:
219
+ - **Jiuchuan S3** (XiaoZhi ESP32-S3 board) — recommended
220
+ - Any ESP32 board running [XiaoZhi firmware](https://github.com/78/xiaozhi-esp32)
221
+
222
+ ---
223
+
224
+ ## License
225
+
226
+ MIT — Part of the [OpenClaw](https://openclaw.ai) project.