oomi-ai 0.2.25 → 0.2.28

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -4,12 +4,14 @@ OpenClaw channel plugin and bridge tooling for Oomi managed chat and voice.
4
4
 
5
5
  ## Current Focus
6
6
 
7
- `0.2.21` adds the first live persona automation lane:
7
+ `0.2.28` keeps the persona automation lane and adds a usable local managed-voice validation path:
8
8
  - WebSpatial-based persona scaffolding for generated Oomi apps
9
9
  - a high-level `oomi personas create-managed` command for agent-driven persona creation
10
10
  - device-authenticated persona runtime registration and job callbacks
11
11
  - automatic bridge-side polling for queued `persona_job` control messages
12
- - end-to-end local persona startup from a structured orchestration payload
12
+ - one shared spoken-metadata normalizer used by both the extension and the bridge
13
+ - a repo-backed local `tts-pipeline` replay that can validate assistant-final -> backend -> real Qwen TTS before publishing
14
+ - spoken-metadata handling that preserves natural pauses like `...` and keeps the managed voice contract valid on the real chat session path
13
15
 
14
16
  This package is for two audiences:
15
17
  - OpenClaw operators who need to connect a machine to Oomi and keep chat or voice healthy
@@ -148,6 +150,46 @@ For managed cloned-voice replies, the canonical contract is:
148
150
 
149
151
  The backend cloned-voice path is intentionally strict. If `metadata.spoken` does not reach Oomi, backend TTS fails instead of speaking a flat fallback voice.
150
152
 
153
+ ## Local TTS Validation
154
+
155
+ If you are developing this package inside the Oomi repo, you can now validate the managed voice path locally before publishing.
156
+
157
+ This local gate does three things:
158
+ - replays an assistant `chat.final` frame through the same spoken-metadata normalization path used by the OpenClaw extension and the bridge
159
+ - feeds that normalized frame into the Rails backend replay harness
160
+ - optionally calls the real Qwen cloned-voice provider and confirms that audio deltas come back
161
+
162
+ Important:
163
+ - this is a repo developer workflow, not a generic npm-only operator command
164
+ - it expects the Oomi repo checkout, the Rails backend, and local provider env vars
165
+ - the real-provider replay can auto-enroll a disposable default sample voice profile from `assets/voice/source/nemu-enrollment-sample.mp3`
166
+
167
+ Assistant-final contract only:
168
+
169
+ ```bash
170
+ oomi openclaw debug assistant-final --text "Hey Justin! How is the testing going?" --json
171
+ ```
172
+
173
+ Full local backend replay:
174
+
175
+ ```bash
176
+ oomi openclaw debug tts-pipeline --text "When your voice reaches me, it gets turned into text, I read it and think about it, then I speak back through the managed chat session." --json
177
+ ```
178
+
179
+ Real Qwen provider replay:
180
+
181
+ ```bash
182
+ oomi openclaw debug tts-pipeline --text "When your voice reaches me, it gets turned into text, I read it and think about it, then I speak back through the managed chat session." --live-provider --env-file .env.local --provider-timeout-ms 20000 --json
183
+ ```
184
+
185
+ What a good result looks like:
186
+ - `backend.success = true`
187
+ - `managed.assistantSpeechFinal.present = true`
188
+ - `qwen.errorCode = null`
189
+ - `qwen.audioDeltaCount > 0` when `--live-provider` is used
190
+
191
+ This is the preferred pre-publish gate for managed voice regressions, because it is much faster than publishing to npm and testing through a live OpenClaw machine first.
192
+
151
193
  ## Persona Scaffolding
152
194
 
153
195
  Use the scaffold flow when OpenClaw needs to build a managed persona app that will live inside Oomi:
@@ -192,7 +234,7 @@ The bridge status file is written locally and should roughly be interpreted as:
192
234
 
193
235
  For voice support, a `voice_session_*` failure should be treated as narrower than a full provider outage.
194
236
 
195
- ## Troubleshooting
237
+ ## Troubleshooting
196
238
 
197
239
  ### `invalid handshake: first request must be connect`
198
240
 
@@ -223,19 +265,32 @@ What to check:
223
265
 
224
266
  If the process is alive but runtime faults are being caught, expect `degraded` rather than an immediate hard stop.
225
267
 
226
- ### Voice STT works but the agent does not answer
268
+ ### Voice STT works but the agent does not answer
227
269
 
228
270
  This usually means one of these:
229
271
  - the managed gateway/device side is not actually ready
230
272
  - the bridge or agent run failed after delivery
231
273
  - the OpenClaw run stopped with an upstream provider `network_error`
232
274
 
233
- In that situation, inspect:
234
- - `~/.openclaw/logs/gateway.log`
235
- - `~/.openclaw/logs/gateway.err.log`
236
- - the relevant session JSONL in `~/.openclaw/agents/main/sessions/`
275
+ In that situation, inspect:
276
+ - `~/.openclaw/logs/gateway.log`
277
+ - `~/.openclaw/logs/gateway.err.log`
278
+ - the relevant session JSONL in `~/.openclaw/agents/main/sessions/`
279
+
280
+ ### Voice text works but cloned TTS fails with `MISSING_SPOKEN_METADATA`
281
+
282
+ Meaning:
283
+ - the assistant text arrived
284
+ - the backend voice relay never received valid hidden `metadata.spoken`
285
+
286
+ What to check:
287
+ - run the local replay gate before publishing:
288
+ - `oomi openclaw debug assistant-final --text "..."`
289
+ - `oomi openclaw debug tts-pipeline --text "..."`
290
+ - if the package local replay succeeds but the live machine fails, verify the OpenClaw machine is actually running the updated bridge binary
291
+ - if the local replay fails, fix the assistant-final contract first instead of debugging the browser or backend deployment
237
292
 
238
- ## Developer Notes
293
+ ## Developer Notes
239
294
 
240
295
  If you are inspecting this package on npm, the main architectural points are:
241
296
  - the extension path is the stable managed text contract
@@ -248,24 +303,38 @@ If you are inspecting this package on npm, the main architectural points are:
248
303
  - runtime fault isolation so local session failures are less likely to crash the whole provider
249
304
  - one shared hidden managed-voice speech metadata helper used by both the extension and the local bridge
250
305
 
251
- If you are developing the plugin, test the packaged surface with:
252
-
253
- ```bash
254
- cd packages/oomi-ai
255
- node --test test/*.test.mjs
256
- npm pack --dry-run
257
- ```
258
-
259
- ## Release Process
260
-
261
- Before publishing:
262
-
263
- ```bash
264
- cd packages/oomi-ai
265
- node --test test/*.test.mjs
266
- npm pack --dry-run
267
- ```
268
-
306
+ If you are developing the plugin, test the packaged surface with:
307
+
308
+ ```bash
309
+ cd packages/oomi-ai
310
+ node --test test/*.test.mjs
311
+ npm pack --dry-run
312
+ ```
313
+
314
+ For managed voice changes, do not stop at the package tests. Run the local replay gate from the repo root as well, especially before publishing:
315
+
316
+ ```bash
317
+ oomi openclaw debug tts-pipeline --text "Local managed voice validation text." --json
318
+ oomi openclaw debug tts-pipeline --text "Local managed voice validation text." --live-provider --env-file .env.local --provider-timeout-ms 20000 --json
319
+ ```
320
+
321
+ ## Release Process
322
+
323
+ Before publishing:
324
+
325
+ ```bash
326
+ cd packages/oomi-ai
327
+ node --test test/*.test.mjs
328
+ npm pack --dry-run
329
+ ```
330
+
331
+ For voice-related changes, also run the repo-backed local replay gate before publish:
332
+
333
+ ```bash
334
+ oomi openclaw debug tts-pipeline --text "Local managed voice validation text." --json
335
+ oomi openclaw debug tts-pipeline --text "Local managed voice validation text." --live-provider --env-file .env.local --provider-timeout-ms 20000 --json
336
+ ```
337
+
269
338
  Then publish the bumped version:
270
339
 
271
340
  ```bash
package/bin/oomi-ai.js CHANGED
@@ -39,13 +39,21 @@ const BRIDGE_CONNECT_CHALLENGE_TIMEOUT_MS = parsePositiveInteger(
39
39
  process.env.OOMI_BRIDGE_CONNECT_CHALLENGE_TIMEOUT_MS,
40
40
  3000
41
41
  );
42
- const BRIDGE_GATEWAY_REQUEST_TIMEOUT_MS = parsePositiveInteger(
43
- process.env.OOMI_BRIDGE_GATEWAY_REQUEST_TIMEOUT_MS,
44
- 30000
45
- );
46
- const BRIDGE_LAUNCHD_LABEL = 'ai.oomi.bridge';
47
- const DEVICE_IDENTITY_PATH = path.join(os.homedir(), '.openclaw', 'identity', 'device.json');
48
- const ED25519_SPKI_PREFIX = Buffer.from('302a300506032b6570032100', 'hex');
42
+ const BRIDGE_GATEWAY_REQUEST_TIMEOUT_MS = parsePositiveInteger(
43
+ process.env.OOMI_BRIDGE_GATEWAY_REQUEST_TIMEOUT_MS,
44
+ 30000
45
+ );
46
+ const BRIDGE_LAUNCHD_LABEL = 'ai.oomi.bridge';
47
+ const DEBUG_PROVIDER_ENV_KEYS = [
48
+ 'QWEN_REALTIME_API_KEY',
49
+ 'QWEN_REALTIME_BASE_URL',
50
+ 'QWEN_REALTIME_ASR_MODEL',
51
+ 'QWEN_REALTIME_TTS_MODEL',
52
+ 'QWEN_REALTIME_TTS_VOICE',
53
+ 'QWEN_REALTIME_LANGUAGE',
54
+ ];
55
+ const DEVICE_IDENTITY_PATH = path.join(os.homedir(), '.openclaw', 'identity', 'device.json');
56
+ const ED25519_SPKI_PREFIX = Buffer.from('302a300506032b6570032100', 'hex');
49
57
 
50
58
  function parsePositiveInteger(value, fallback) {
51
59
  const num = Number(value);
@@ -169,10 +177,14 @@ Commands:
169
177
  openclaw install
170
178
  Install agent instructions and the Oomi skill into OpenClaw.
171
179
 
172
- openclaw bridge [start|ensure|stop|restart|ps]
173
- Manage local OpenClaw-to-Oomi bridge lifecycle (singleton).
174
- openclaw bridge service [install|start|stop|restart|status|uninstall]
175
- Manage macOS launchd bridge supervision.
180
+ openclaw bridge [start|ensure|stop|restart|ps]
181
+ Manage local OpenClaw-to-Oomi bridge lifecycle (singleton).
182
+ openclaw bridge service [install|start|stop|restart|status|uninstall]
183
+ Manage macOS launchd bridge supervision.
184
+ openclaw debug assistant-final
185
+ Replay an assistant chat.final frame through spoken-metadata normalization.
186
+ openclaw debug tts-pipeline
187
+ Replay an assistant chat.final through local backend voice handling.
176
188
 
177
189
  openclaw pair
178
190
  Pair this OpenClaw host with Oomi and start bridge (single command).
@@ -225,9 +237,20 @@ Common flags:
225
237
  --device-id ID Bridge device identifier (default: host name)
226
238
  --device-token TOKEN Existing bridge device token
227
239
  --show-secrets Print full token values in diagnostic output
228
- --json Print pairing result as JSON (for automation)
229
- --backend-url URL Override Oomi backend URL
230
- --root PATH Override repo root path for persona discovery
240
+ --json Print pairing result as JSON (for automation)
241
+ --text TEXT Assistant text for local debug frame replay
242
+ --frame-file PATH Read a raw gateway frame from disk for local debug replay
243
+ --frame-json JSON Use raw gateway frame JSON text for local debug replay
244
+ --session-id ID Debug session id override (default: ms_debug_local)
245
+ --user-text TEXT User utterance text used for backend voice replay
246
+ --live-provider Use the real Qwen TTS provider in local debug replay
247
+ --env-file PATH Load provider env vars from a specific env file (default: <repo>/.env.local)
248
+ --provider-timeout-ms N
249
+ Timeout in ms for live provider audio during local debug replay
250
+ --backend-url URL Override Oomi backend URL
251
+ --root PATH Override repo root path for persona discovery
252
+ --role ROLE Message role override for local debug frame replay
253
+ --omit-role Omit message.role in the generated local debug frame
231
254
  --name NAME Persona display name (for create)
232
255
  --description TEXT Persona description (for scaffold)
233
256
  --slug SLUG Explicit slug override (for create-managed)
@@ -261,13 +284,43 @@ function readFile(filePath) {
261
284
  return fs.readFileSync(filePath, 'utf-8');
262
285
  }
263
286
 
264
- function writeFile(filePath, content, options = undefined) {
265
- fs.writeFileSync(filePath, content, options);
266
- }
267
-
268
- function xmlEscape(value) {
269
- return String(value)
270
- .replaceAll('&', '&amp;')
287
+ function writeFile(filePath, content, options = undefined) {
288
+ fs.writeFileSync(filePath, content, options);
289
+ }
290
+
291
+ function parseDotEnvLine(line) {
292
+ const trimmed = String(line || '').trim();
293
+ if (!trimmed || trimmed.startsWith('#')) return null;
294
+ const separatorIndex = trimmed.indexOf('=');
295
+ if (separatorIndex <= 0) return null;
296
+ const key = trimmed.slice(0, separatorIndex).trim();
297
+ if (!key) return null;
298
+ let value = trimmed.slice(separatorIndex + 1).trim();
299
+ if ((value.startsWith('"') && value.endsWith('"')) || (value.startsWith("'") && value.endsWith("'"))) {
300
+ value = value.slice(1, -1);
301
+ }
302
+ return { key, value };
303
+ }
304
+
305
+ function loadEnvFile(filePath, keys = []) {
306
+ if (!filePath || !fs.existsSync(filePath)) {
307
+ throw new Error(`Environment file not found: ${filePath}`);
308
+ }
309
+ const selectedKeys = Array.isArray(keys) && keys.length ? new Set(keys) : null;
310
+ const entries = {};
311
+ const lines = readFile(filePath).split(/\r?\n/);
312
+ for (const line of lines) {
313
+ const parsed = parseDotEnvLine(line);
314
+ if (!parsed) continue;
315
+ if (selectedKeys && !selectedKeys.has(parsed.key)) continue;
316
+ entries[parsed.key] = parsed.value;
317
+ }
318
+ return entries;
319
+ }
320
+
321
+ function xmlEscape(value) {
322
+ return String(value)
323
+ .replaceAll('&', '&amp;')
271
324
  .replaceAll('<', '&lt;')
272
325
  .replaceAll('>', '&gt;')
273
326
  .replaceAll('"', '&quot;')
@@ -356,9 +409,9 @@ function ensureDir(dirPath) {
356
409
  }
357
410
  }
358
411
 
359
- function findRepoRoot(startDir) {
360
- let current = startDir;
361
- for (let i = 0; i < 6; i += 1) {
412
+ function findRepoRoot(startDir) {
413
+ let current = startDir;
414
+ for (let i = 0; i < 6; i += 1) {
362
415
  const personasDir = path.join(current, 'personas');
363
416
  const skillsDir = path.join(current, 'skills', 'oomi');
364
417
  if (fs.existsSync(personasDir) || fs.existsSync(skillsDir)) {
@@ -367,11 +420,23 @@ function findRepoRoot(startDir) {
367
420
  const parent = path.dirname(current);
368
421
  if (parent === current) break;
369
422
  current = parent;
370
- }
371
- return null;
372
- }
373
-
374
- function resolveSkillSource(cliRoot) {
423
+ }
424
+ return null;
425
+ }
426
+
427
+ function resolveRepoRoot(rootFlag) {
428
+ const explicitRoot =
429
+ typeof rootFlag === 'string' && rootFlag.trim()
430
+ ? path.resolve(rootFlag.trim())
431
+ : '';
432
+ const repoRoot = explicitRoot || findRepoRoot(process.cwd()) || findRepoRoot(PACKAGE_ROOT);
433
+ if (!repoRoot) {
434
+ throw new Error('Could not locate repo root. Use --root <repo root>.');
435
+ }
436
+ return repoRoot;
437
+ }
438
+
439
+ function resolveSkillSource(cliRoot) {
375
440
  const packaged = path.join(PACKAGE_ROOT, 'skills', 'oomi');
376
441
  if (fs.existsSync(packaged)) {
377
442
  return packaged;
@@ -1698,7 +1763,7 @@ function summarizeVoiceFrameContract(frameText) {
1698
1763
  };
1699
1764
  }
1700
1765
 
1701
- function ensureVoiceAssistantSpokenMetadata(frameText) {
1766
+ function ensureAssistantSpokenMetadata(frameText) {
1702
1767
  const frame = parseJsonPayload(frameText);
1703
1768
  if (!frame || typeof frame !== 'object') {
1704
1769
  return { frameText, changed: false, reason: '' };
@@ -1753,6 +1818,395 @@ function ensureVoiceAssistantSpokenMetadata(frameText) {
1753
1818
  reason: normalizedExplicitSpoken ? 'normalized' : (messageRole ? 'synthesized' : 'synthesized_missing_role'),
1754
1819
  };
1755
1820
  }
1821
+
1822
+ function normalizeAssistantGatewayFrame(sessionId, frameText) {
1823
+ const scope = classifyBridgeSessionScope(sessionId);
1824
+ const summary = summarizeVoiceFrameContract(frameText);
1825
+ if (!summary.parseable || summary.event !== 'chat' || summary.state !== 'final') {
1826
+ return {
1827
+ frameText,
1828
+ changed: false,
1829
+ reason: '',
1830
+ scope,
1831
+ summary,
1832
+ };
1833
+ }
1834
+
1835
+ const normalized = ensureAssistantSpokenMetadata(frameText);
1836
+ return {
1837
+ ...normalized,
1838
+ scope,
1839
+ summary,
1840
+ };
1841
+ }
1842
+
1843
+ function buildAssistantFinalDebugFrame({ sessionKey, text, role }) {
1844
+ const trimmedSessionKey =
1845
+ typeof sessionKey === 'string' && sessionKey.trim()
1846
+ ? sessionKey.trim()
1847
+ : 'agent:main:webchat:channel:oomi';
1848
+ const message = {
1849
+ content: String(text || ''),
1850
+ };
1851
+ if (typeof role === 'string' && role.trim()) {
1852
+ message.role = role.trim();
1853
+ }
1854
+ return JSON.stringify({
1855
+ type: 'event',
1856
+ event: 'chat',
1857
+ payload: {
1858
+ sessionKey: trimmedSessionKey,
1859
+ state: 'final',
1860
+ message,
1861
+ },
1862
+ });
1863
+ }
1864
+
1865
+ function extractSpokenMetadata(frameText) {
1866
+ const payload = parseJsonPayload(frameText);
1867
+ const message =
1868
+ payload &&
1869
+ payload.payload &&
1870
+ typeof payload.payload === 'object' &&
1871
+ payload.payload.message &&
1872
+ typeof payload.payload.message === 'object'
1873
+ ? payload.payload.message
1874
+ : null;
1875
+ const metadata =
1876
+ message &&
1877
+ message.metadata &&
1878
+ typeof message.metadata === 'object' &&
1879
+ !Array.isArray(message.metadata)
1880
+ ? message.metadata
1881
+ : {};
1882
+ return normalizeSpokenMetadata(metadata.spoken);
1883
+ }
1884
+
1885
+ function runAssistantFinalDebugCheck(options = {}) {
1886
+ const sessionId =
1887
+ typeof options.sessionId === 'string' && options.sessionId.trim()
1888
+ ? options.sessionId.trim()
1889
+ : 'ms_debug_local';
1890
+ const sessionKey =
1891
+ typeof options.sessionKey === 'string' && options.sessionKey.trim()
1892
+ ? options.sessionKey.trim()
1893
+ : 'agent:main:webchat:channel:oomi';
1894
+ const role =
1895
+ options.omitRole
1896
+ ? ''
1897
+ : (typeof options.role === 'string' && options.role.trim() ? options.role.trim() : 'assistant');
1898
+
1899
+ const rawFrameText =
1900
+ typeof options.frameText === 'string' && options.frameText.trim()
1901
+ ? options.frameText
1902
+ : buildAssistantFinalDebugFrame({
1903
+ sessionKey,
1904
+ text: options.text,
1905
+ role,
1906
+ });
1907
+
1908
+ const before = summarizeVoiceFrameContract(rawFrameText);
1909
+ const normalized = normalizeAssistantGatewayFrame(sessionId, rawFrameText);
1910
+ const after = summarizeVoiceFrameContract(normalized.frameText);
1911
+ const spoken = extractSpokenMetadata(normalized.frameText);
1912
+
1913
+ return {
1914
+ sessionId,
1915
+ sessionKey,
1916
+ scope: normalized.scope,
1917
+ changed: normalized.changed,
1918
+ reason: normalized.reason,
1919
+ before,
1920
+ after,
1921
+ spoken,
1922
+ frameText: normalized.frameText,
1923
+ };
1924
+ }
1925
+
1926
+ function printAssistantFinalDebugResult(result, asJson) {
1927
+ if (asJson) {
1928
+ console.log(JSON.stringify(result, null, 2));
1929
+ return;
1930
+ }
1931
+
1932
+ console.log(`Session id: ${result.sessionId}`);
1933
+ console.log(`Session key: ${result.sessionKey}`);
1934
+ console.log(`Scope: ${result.scope}`);
1935
+ console.log(`Changed: ${result.changed ? 'yes' : 'no'}${result.reason ? ` (${result.reason})` : ''}`);
1936
+ console.log(
1937
+ `Before: event=${result.before.event || '<none>'} state=${result.before.state || '<none>'} role=${result.before.role || '<none>'} spoken=${result.before.spokenNormalized ? 'yes' : 'no'}`
1938
+ );
1939
+ console.log(
1940
+ `After: event=${result.after.event || '<none>'} state=${result.after.state || '<none>'} role=${result.after.role || '<none>'} spoken=${result.after.spokenNormalized ? 'yes' : 'no'}`
1941
+ );
1942
+ if (result.spoken) {
1943
+ console.log(`Spoken text: ${result.spoken.text}`);
1944
+ console.log(`Segments: ${Array.isArray(result.spoken.segments) ? result.spoken.segments.length : 0}`);
1945
+ if (typeof result.spoken.instructions === 'string' && result.spoken.instructions.trim()) {
1946
+ console.log(`Instructions: ${result.spoken.instructions}`);
1947
+ }
1948
+ } else {
1949
+ console.log('Spoken text: <missing>');
1950
+ }
1951
+ }
1952
+
1953
+ function resolveCommandFromPath(commandName) {
1954
+ const normalized = String(commandName || '').trim();
1955
+ if (!normalized) return '';
1956
+ try {
1957
+ const probe = spawnSync(process.platform === 'win32' ? 'where' : 'which', [normalized], {
1958
+ encoding: 'utf8',
1959
+ stdio: ['ignore', 'pipe', 'ignore'],
1960
+ });
1961
+ if (probe.status !== 0) return '';
1962
+ const firstLine = String(probe.stdout || '')
1963
+ .split(/\r?\n/)
1964
+ .map((line) => line.trim())
1965
+ .find(Boolean);
1966
+ return firstLine || '';
1967
+ } catch {
1968
+ return '';
1969
+ }
1970
+ }
1971
+
1972
+ function resolveExecutable(candidates = []) {
1973
+ for (const candidate of candidates) {
1974
+ if (!candidate) continue;
1975
+ const value = String(candidate).trim();
1976
+ if (!value) continue;
1977
+ if (path.isAbsolute(value) && fs.existsSync(value)) {
1978
+ return value;
1979
+ }
1980
+ if (value.includes(path.sep) || value.includes('/')) {
1981
+ const resolved = path.resolve(value);
1982
+ if (fs.existsSync(resolved)) {
1983
+ return resolved;
1984
+ }
1985
+ continue;
1986
+ }
1987
+ const fromPath = resolveCommandFromPath(value);
1988
+ if (fromPath) {
1989
+ return fromPath;
1990
+ }
1991
+ }
1992
+ return '';
1993
+ }
1994
+
1995
+ function resolveBackendRoot(rootFlag) {
1996
+ const repoRoot = resolveRepoRoot(rootFlag);
1997
+ const backendRoot = path.join(repoRoot, 'apps', 'backend');
1998
+ if (!fs.existsSync(backendRoot)) {
1999
+ throw new Error(`Could not locate backend app at ${backendRoot}`);
2000
+ }
2001
+ return backendRoot;
2002
+ }
2003
+
2004
+ function resolveRubyExecutable() {
2005
+ const candidates = [
2006
+ process.env.OOMI_RUBY_BIN,
2007
+ process.env.RUBY,
2008
+ process.platform === 'win32' ? 'ruby.exe' : 'ruby',
2009
+ process.platform === 'win32' ? 'ruby' : '',
2010
+ process.platform === 'win32' ? 'C:\\Ruby33-x64\\bin\\ruby.exe' : '',
2011
+ ];
2012
+ const executable = resolveExecutable(candidates);
2013
+ if (!executable) {
2014
+ throw new Error('Ruby executable not found. Set OOMI_RUBY_BIN or install Ruby locally.');
2015
+ }
2016
+ return executable;
2017
+ }
2018
+
2019
+ function resolveBundleExecutable() {
2020
+ const candidates = [
2021
+ process.env.OOMI_BUNDLE_BIN,
2022
+ process.platform === 'win32' ? 'bundle.bat' : 'bundle',
2023
+ 'bundle',
2024
+ process.platform === 'win32' ? 'C:\\Ruby33-x64\\bin\\bundle.bat' : '',
2025
+ ];
2026
+ const executable = resolveExecutable(candidates);
2027
+ if (!executable) {
2028
+ throw new Error('Bundler executable not found. Set OOMI_BUNDLE_BIN or install Bundler locally.');
2029
+ }
2030
+ return executable;
2031
+ }
2032
+
2033
+ function shellQuote(value) {
2034
+ const text = String(value);
2035
+ if (process.platform === 'win32') {
2036
+ return `"${text.replace(/"/g, '""')}"`;
2037
+ }
2038
+ return `'${text.replace(/'/g, `'\\''`)}'`;
2039
+ }
2040
+
2041
+ async function runBundledRubyScript({ backendRoot, scriptPath, inputFile, env = undefined }) {
2042
+ const rubyExecutable = resolveRubyExecutable();
2043
+ const bundleExecutable = resolveBundleExecutable();
2044
+ const commandText = process.platform === 'win32'
2045
+ ? [bundleExecutable, 'exec', rubyExecutable, scriptPath, '--input-file', inputFile].map(shellQuote).join(' ')
2046
+ : '';
2047
+ const childEnv = env ? { ...process.env, ...env } : process.env;
2048
+
2049
+ return await new Promise((resolve, reject) => {
2050
+ const child = process.platform === 'win32'
2051
+ ? spawn(commandText, [], {
2052
+ cwd: backendRoot,
2053
+ shell: true,
2054
+ env: childEnv,
2055
+ stdio: ['ignore', 'pipe', 'pipe'],
2056
+ })
2057
+ : spawn(bundleExecutable, ['exec', rubyExecutable, scriptPath, '--input-file', inputFile], {
2058
+ cwd: backendRoot,
2059
+ env: childEnv,
2060
+ stdio: ['ignore', 'pipe', 'pipe'],
2061
+ });
2062
+
2063
+ let stdout = '';
2064
+ let stderr = '';
2065
+ child.stdout.on('data', (chunk) => {
2066
+ stdout += chunk.toString();
2067
+ });
2068
+ child.stderr.on('data', (chunk) => {
2069
+ stderr += chunk.toString();
2070
+ });
2071
+ child.on('error', reject);
2072
+ child.on('close', (code) => {
2073
+ resolve({ code: Number(code || 0), stdout, stderr });
2074
+ });
2075
+ });
2076
+ }
2077
+
2078
+ async function runLocalTtsPipelineDebugCheck(options = {}) {
2079
+ const assistant = runAssistantFinalDebugCheck(options);
2080
+ const repoRoot = resolveRepoRoot(options.root);
2081
+ const backendRoot = resolveBackendRoot(options.root);
2082
+ const scriptPath = path.join(backendRoot, 'bin', 'voice_tts_replay.rb');
2083
+ if (!fs.existsSync(scriptPath)) {
2084
+ throw new Error(`Backend replay script not found: ${scriptPath}`);
2085
+ }
2086
+
2087
+ const inputPayload = {
2088
+ repoRoot,
2089
+ sessionId: assistant.sessionId,
2090
+ sessionKey: assistant.sessionKey,
2091
+ frameText: assistant.frameText,
2092
+ userText:
2093
+ typeof options.userText === 'string' && options.userText.trim()
2094
+ ? options.userText.trim()
2095
+ : 'local debug utterance',
2096
+ liveProvider: Boolean(options.liveProvider),
2097
+ providerTimeoutMs: parsePositiveInteger(options.providerTimeoutMs, 15000),
2098
+ };
2099
+ let childEnv = undefined;
2100
+ let resolvedEnvFile = '';
2101
+ if (options.liveProvider) {
2102
+ resolvedEnvFile =
2103
+ typeof options.envFile === 'string' && options.envFile.trim()
2104
+ ? path.resolve(options.envFile.trim())
2105
+ : path.join(repoRoot, '.env.local');
2106
+ childEnv = loadEnvFile(resolvedEnvFile, DEBUG_PROVIDER_ENV_KEYS);
2107
+ }
2108
+ const inputFile = path.join(os.tmpdir(), `oomi-voice-replay-${randomUUID()}.json`);
2109
+ writeFile(inputFile, JSON.stringify(inputPayload, null, 2) + '\n');
2110
+
2111
+ try {
2112
+ const backend = await runBundledRubyScript({ backendRoot, scriptPath, inputFile, env: childEnv });
2113
+ const parsed = backend.stdout.trim() ? JSON.parse(backend.stdout) : null;
2114
+ return {
2115
+ assistant,
2116
+ backend: parsed,
2117
+ backendExitCode: backend.code,
2118
+ backendStderr: backend.stderr.trim(),
2119
+ liveProvider: Boolean(options.liveProvider),
2120
+ envFile: resolvedEnvFile || null,
2121
+ };
2122
+ } finally {
2123
+ try {
2124
+ fs.unlinkSync(inputFile);
2125
+ } catch {
2126
+ // no-op
2127
+ }
2128
+ }
2129
+ }
2130
+
2131
+ function printTtsPipelineDebugResult(result, asJson) {
2132
+ if (asJson) {
2133
+ console.log(JSON.stringify(result, null, 2));
2134
+ return;
2135
+ }
2136
+
2137
+ console.log(`Assistant normalization: ${result.assistant.changed ? 'changed' : 'unchanged'}${result.assistant.reason ? ` (${result.assistant.reason})` : ''}`);
2138
+ console.log(`Assistant spoken segments: ${Array.isArray(result.assistant.spoken?.segments) ? result.assistant.spoken.segments.length : 0}`);
2139
+ if (!result.backend) {
2140
+ console.log('Backend replay: <no output>');
2141
+ return;
2142
+ }
2143
+ console.log(`Backend replay success: ${result.backend.success ? 'yes' : 'no'}`);
2144
+ console.log(`Managed speech sidecar: ${result.backend.managed?.assistantSpeechFinal?.present ? 'yes' : 'no'}`);
2145
+ console.log(`Backend final text: ${result.backend.qwen?.assistantTextFinal || '<missing>'}`);
2146
+ console.log(`Backend TTS appends: ${Array.isArray(result.backend.qwen?.ttsAppends) ? result.backend.qwen.ttsAppends.length : 0}`);
2147
+ console.log(`Backend TTS commits: ${Number(result.backend.qwen?.commitCount || 0)}`);
2148
+ if (result.liveProvider) {
2149
+ console.log(`Live provider audio deltas: ${Number(result.backend.qwen?.audioDeltaCount || 0)}`);
2150
+ console.log(`Live provider audio bytes (base64): ${Number(result.backend.qwen?.audioDeltaBytes || 0)}`);
2151
+ console.log(`Live provider timeout: ${result.backend.qwen?.providerTimedOut ? 'yes' : 'no'}`);
2152
+ }
2153
+ if (result.backend.qwen?.errorCode) {
2154
+ console.log(`Backend error: ${result.backend.qwen.errorCode}`);
2155
+ }
2156
+ if (result.backendStderr) {
2157
+ console.log(`Backend stderr: ${result.backendStderr}`);
2158
+ }
2159
+ }
2160
+
2161
+ async function handleOpenclawDebugCommand(action, flags) {
2162
+ const normalizedAction = String(action || '').trim().toLowerCase();
2163
+ const frameFile =
2164
+ typeof flags['frame-file'] === 'string' && flags['frame-file'].trim()
2165
+ ? path.resolve(flags['frame-file'])
2166
+ : '';
2167
+ const frameText =
2168
+ frameFile
2169
+ ? readFile(frameFile)
2170
+ : (typeof flags['frame-json'] === 'string' && flags['frame-json'].trim() ? flags['frame-json'] : '');
2171
+ const text = typeof flags.text === 'string' ? flags.text : '';
2172
+
2173
+ if (!frameText && !text.trim()) {
2174
+ throw new Error(
2175
+ 'Assistant text or frame input is required. Usage: oomi openclaw debug assistant-final --text "<assistant text>"'
2176
+ );
2177
+ }
2178
+
2179
+ const debugOptions = {
2180
+ sessionId: flags['session-id'],
2181
+ sessionKey: flags['session-key'],
2182
+ role: flags.role,
2183
+ omitRole: isTruthyFlag(flags['omit-role']),
2184
+ text,
2185
+ frameText,
2186
+ root: flags.root,
2187
+ userText: flags['user-text'],
2188
+ liveProvider: isTruthyFlag(flags['live-provider']),
2189
+ envFile: flags['env-file'],
2190
+ providerTimeoutMs: flags['provider-timeout-ms'],
2191
+ };
2192
+
2193
+ if (normalizedAction === 'assistant-final') {
2194
+ const result = runAssistantFinalDebugCheck(debugOptions);
2195
+ printAssistantFinalDebugResult(result, isTruthyFlag(flags.json));
2196
+ return;
2197
+ }
2198
+
2199
+ if (normalizedAction === 'tts-pipeline') {
2200
+ const result = await runLocalTtsPipelineDebugCheck(debugOptions);
2201
+ printTtsPipelineDebugResult(result, isTruthyFlag(flags.json));
2202
+ if (!result.backend?.success) {
2203
+ throw new Error(result.backend?.qwen?.errorCode || 'Local backend TTS replay failed.');
2204
+ }
2205
+ return;
2206
+ }
2207
+
2208
+ throw new Error('Unknown debug action: ' + normalizedAction + '. Use: oomi openclaw debug assistant-final|tts-pipeline');
2209
+ }
1756
2210
 
1757
2211
  function extractCorrelationId(params) {
1758
2212
  if (!params || typeof params !== 'object') return '';
@@ -2987,18 +3441,17 @@ async function startOpenclawBridge(flags) {
2987
3441
 
2988
3442
  gatewaySocket.on('message', runBridgeCallbackSafely((gatewayRaw) => {
2989
3443
  let frame = typeof gatewayRaw === 'string' ? gatewayRaw : gatewayRaw.toString();
2990
- if (classifyBridgeSessionScope(sessionId) === 'voice') {
2991
- const beforeSummary = summarizeVoiceFrameContract(frame);
2992
- const spokenNormalized = ensureVoiceAssistantSpokenMetadata(frame);
2993
- if (spokenNormalized.changed) {
2994
- frame = spokenNormalized.frameText;
3444
+ const spokenNormalized = normalizeAssistantGatewayFrame(sessionId, frame);
3445
+ if (spokenNormalized.changed) {
3446
+ frame = spokenNormalized.frameText;
3447
+ if (spokenNormalized.scope === 'voice') {
2995
3448
  console.log(`[bridge] voice.spoken_metadata.${spokenNormalized.reason} ${sessionId} ${JSON.stringify({
2996
- before: beforeSummary,
3449
+ before: spokenNormalized.summary,
2997
3450
  after: summarizeVoiceFrameContract(frame),
2998
3451
  })}`);
2999
- } else if (beforeSummary.event === 'chat' && beforeSummary.state === 'final') {
3000
- console.log(`[bridge] voice.chat.final ${sessionId} ${JSON.stringify(beforeSummary)}`);
3001
3452
  }
3453
+ } else if (spokenNormalized.scope === 'voice' && spokenNormalized.summary.event === 'chat' && spokenNormalized.summary.state === 'final') {
3454
+ console.log(`[bridge] voice.chat.final ${sessionId} ${JSON.stringify(spokenNormalized.summary)}`);
3002
3455
  }
3003
3456
  const gatewayPayload = parseJsonPayload(frame);
3004
3457
  if (gatewayPayload?.event === 'connect.challenge') {
@@ -4139,10 +4592,15 @@ async function main() {
4139
4592
  return;
4140
4593
  }
4141
4594
 
4142
- if (command === 'openclaw' && subcommand === 'plugin') {
4143
- printOpenclawPluginSetup(args.flags);
4144
- return;
4145
- }
4595
+ if (command === 'openclaw' && subcommand === 'plugin') {
4596
+ printOpenclawPluginSetup(args.flags);
4597
+ return;
4598
+ }
4599
+
4600
+ if (command === 'openclaw' && subcommand === 'debug') {
4601
+ await handleOpenclawDebugCommand(args.positionals[0], args.flags);
4602
+ return;
4603
+ }
4146
4604
 
4147
4605
  if (command === 'personas' && subcommand === 'sync') {
4148
4606
  await syncPersonas({ backendUrl: args.flags['backend-url'], root: args.flags.root });
@@ -4257,7 +4715,9 @@ if (__isDirectExecution) {
4257
4715
 
4258
4716
  export {
4259
4717
  prepareGatewayFrameForLocalGateway,
4260
- ensureVoiceAssistantSpokenMetadata,
4718
+ ensureAssistantSpokenMetadata,
4719
+ normalizeAssistantGatewayFrame,
4720
+ runAssistantFinalDebugCheck,
4261
4721
  buildBridgeLaunchAgentPlist,
4262
4722
  classifyBridgeFailure,
4263
4723
  classifyBridgeSessionScope,
@@ -2,6 +2,8 @@ function trimString(value, fallback = '') {
2
2
  return typeof value === 'string' && value.trim() ? value.trim() : fallback;
3
3
  }
4
4
 
5
+ const ELLIPSIS_PLACEHOLDER = '__OOMI_ELLIPSIS__';
6
+
5
7
  function stripAvatarCommandTags(text) {
6
8
  return text.replace(/\[(anim|animation|face|expression|emotion|gesture|look|gaze):[^\]]+\]/gi, ' ');
7
9
  }
@@ -70,15 +72,19 @@ function normalizeSpeechText(text) {
70
72
  .replace(/\*\*(.*?)\*\*/g, '$1')
71
73
  .replace(/__(.*?)__/g, '$1')
72
74
  .replace(/`([^`]+)`/g, '$1')
73
- .replace(/[\u2013\u2014]/g, ', ')
74
- .replace(/\u2026/g, '...')
75
- .replace(/\s+/g, ' ')
76
- .replace(/\s+([,.;!?])/g, '$1')
77
- .replace(/([,.;!?])(?=[^\s])/g, '$1 ')
78
- .replace(/,\s*,+/g, ', ')
79
- .replace(/\s+/g, ' ')
80
- .trim();
81
- }
75
+ .replace(/[\u2013\u2014]/g, ', ')
76
+ .replace(/\u2026/g, ELLIPSIS_PLACEHOLDER)
77
+ .replace(/\.{3,}/g, ELLIPSIS_PLACEHOLDER)
78
+ .replace(/\s+/g, ' ')
79
+ .replace(/\s+([,.;!?])/g, '$1')
80
+ .replace(/([,;!?])(?=[^\s])/g, '$1 ')
81
+ .replace(/(\.)(?=[^\s.])/g, '$1 ')
82
+ .replace(/,\s*,+/g, ', ')
83
+ .replace(new RegExp(`${ELLIPSIS_PLACEHOLDER}(?=[^\\s,.;!?])`, 'g'), `${ELLIPSIS_PLACEHOLDER} `)
84
+ .replace(new RegExp(ELLIPSIS_PLACEHOLDER, 'g'), '...')
85
+ .replace(/\s+/g, ' ')
86
+ .trim();
87
+ }
82
88
 
83
89
  function splitSpeechSegments(text) {
84
90
  const normalized = normalizeSpeechText(text);
@@ -100,15 +106,13 @@ function splitSpeechSegments(text) {
100
106
  .split(/(?<=[,;:])\s+/)
101
107
  .map((part) => part.trim())
102
108
  .filter(Boolean);
103
-
104
- if (clauseParts.length > 1) {
105
- for (let index = 0; index < clauseParts.length; index += 1) {
106
- const part = clauseParts[index];
107
- const needsComma = index < clauseParts.length - 1 && !/[.!?]$/.test(part);
108
- segments.push(needsComma ? `${part},` : part);
109
- }
110
- continue;
111
- }
109
+
110
+ if (clauseParts.length > 1) {
111
+ for (const part of clauseParts) {
112
+ segments.push(part);
113
+ }
114
+ continue;
115
+ }
112
116
 
113
117
  segments.push(segment);
114
118
  }
@@ -230,10 +234,10 @@ function normalizeSpokenMetadata(spoken) {
230
234
  return normalized;
231
235
  }
232
236
 
233
- function inferSpokenMetadataFromContent(content) {
234
- const text = normalizeSpeechText(trimString(content));
235
- if (!text) return null;
236
- const synthesized = synthesizeSpokenSegments(text);
237
+ function inferSpokenMetadataFromContent(content) {
238
+ const text = normalizeSpeechText(trimString(content));
239
+ if (!text) return null;
240
+ const synthesized = synthesizeSpokenSegments(text);
237
241
 
238
242
  const normalized = text.toLowerCase();
239
243
  const upbeat =
@@ -243,44 +247,44 @@ function inferSpokenMetadataFromContent(content) {
243
247
  /\b(sorry|gentle|softly|careful|reassuring|calm|okay|it'?s okay|i know)\b/.test(normalized);
244
248
  const curious = /\?/.test(text);
245
249
 
246
- if (upbeat) {
247
- return {
248
- text,
249
- language: synthesized?.language || 'English',
250
- segments: synthesized?.segments,
251
- instructions: 'Speak with warm, upbeat conversational energy and natural pacing.',
252
- style: { emotion: 'upbeat', energy: 'medium' },
253
- };
254
- }
255
-
256
- if (gentle) {
257
- return {
258
- text,
259
- language: synthesized?.language || 'English',
260
- segments: synthesized?.segments,
261
- instructions: 'Speak gently and reassuringly, with a calm pace and soft emphasis.',
262
- style: { emotion: 'gentle', energy: 'low' },
263
- };
264
- }
265
-
266
- if (curious) {
267
- return {
268
- text,
269
- language: synthesized?.language || 'English',
270
- segments: synthesized?.segments,
271
- instructions: 'Speak naturally with curious, engaged intonation and a conversational pace.',
272
- style: { emotion: 'curious', energy: 'medium' },
273
- };
274
- }
275
-
276
- return {
277
- text,
278
- language: synthesized?.language || 'English',
279
- segments: synthesized?.segments,
280
- instructions: 'Speak naturally with light warmth and conversational pacing.',
281
- style: { emotion: 'neutral', energy: 'medium' },
282
- };
283
- }
250
+ if (upbeat) {
251
+ return normalizeSpokenMetadata({
252
+ text,
253
+ language: synthesized?.language || 'English',
254
+ segments: synthesized?.segments,
255
+ instructions: 'Speak with warm, upbeat conversational energy and natural pacing.',
256
+ style: { emotion: 'upbeat', energy: 'medium' },
257
+ });
258
+ }
259
+
260
+ if (gentle) {
261
+ return normalizeSpokenMetadata({
262
+ text,
263
+ language: synthesized?.language || 'English',
264
+ segments: synthesized?.segments,
265
+ instructions: 'Speak gently and reassuringly, with a calm pace and soft emphasis.',
266
+ style: { emotion: 'gentle', energy: 'low' },
267
+ });
268
+ }
269
+
270
+ if (curious) {
271
+ return normalizeSpokenMetadata({
272
+ text,
273
+ language: synthesized?.language || 'English',
274
+ segments: synthesized?.segments,
275
+ instructions: 'Speak naturally with curious, engaged intonation and a conversational pace.',
276
+ style: { emotion: 'curious', energy: 'medium' },
277
+ });
278
+ }
279
+
280
+ return normalizeSpokenMetadata({
281
+ text,
282
+ language: synthesized?.language || 'English',
283
+ segments: synthesized?.segments,
284
+ instructions: 'Speak naturally with light warmth and conversational pacing.',
285
+ style: { emotion: 'neutral', energy: 'medium' },
286
+ });
287
+ }
284
288
 
285
289
  export {
286
290
  inferSpokenMetadataFromContent,
@@ -2,7 +2,7 @@
2
2
  "id": "oomi-ai",
3
3
  "name": "Oomi Channel Plugin",
4
4
  "description": "Managed Oomi channel integration for OpenClaw.",
5
- "version": "0.2.25",
5
+ "version": "0.2.28",
6
6
  "author": "Oomi",
7
7
  "license": "MIT",
8
8
  "openclawVersion": ">=0.5.0",
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "oomi-ai",
3
- "version": "0.2.25",
3
+ "version": "0.2.28",
4
4
  "description": "Oomi OpenClaw channel plugin and bridge tooling",
5
5
  "bin": {
6
6
  "oomi": "bin/oomi-ai.js"