oomi-ai 0.2.24 → 0.2.27

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -4,12 +4,13 @@ OpenClaw channel plugin and bridge tooling for Oomi managed chat and voice.
4
4
 
5
5
  ## Current Focus
6
6
 
7
- `0.2.21` adds the first live persona automation lane:
7
+ `0.2.27` keeps the persona automation lane and adds a usable local managed-voice validation path:
8
8
  - WebSpatial-based persona scaffolding for generated Oomi apps
9
9
  - a high-level `oomi personas create-managed` command for agent-driven persona creation
10
10
  - device-authenticated persona runtime registration and job callbacks
11
11
  - automatic bridge-side polling for queued `persona_job` control messages
12
- - end-to-end local persona startup from a structured orchestration payload
12
+ - one shared spoken-metadata normalizer used by both the extension and the bridge
13
+ - a repo-backed local `tts-pipeline` replay that can validate assistant-final -> backend -> real Qwen TTS before publishing
13
14
 
14
15
  This package is for two audiences:
15
16
  - OpenClaw operators who need to connect a machine to Oomi and keep chat or voice healthy
@@ -148,6 +149,46 @@ For managed cloned-voice replies, the canonical contract is:
148
149
 
149
150
  The backend cloned-voice path is intentionally strict. If `metadata.spoken` does not reach Oomi, backend TTS fails instead of speaking a flat fallback voice.
150
151
 
152
+ ## Local TTS Validation
153
+
154
+ If you are developing this package inside the Oomi repo, you can now validate the managed voice path locally before publishing.
155
+
156
+ This local gate does three things:
157
+ - replays an assistant `chat.final` frame through the same spoken-metadata normalization path used by the OpenClaw extension and the bridge
158
+ - feeds that normalized frame into the Rails backend replay harness
159
+ - optionally calls the real Qwen cloned-voice provider and confirms that audio deltas come back
160
+
161
+ Important:
162
+ - this is a repo developer workflow, not a generic npm-only operator command
163
+ - it expects the Oomi repo checkout, the Rails backend, and local provider env vars
164
+ - the real-provider replay can auto-enroll a disposable default sample voice profile from `assets/voice/source/nemu-enrollment-sample.mp3`
165
+
166
+ Assistant-final contract only:
167
+
168
+ ```bash
169
+ oomi openclaw debug assistant-final --text "Hey Justin! How is the testing going?" --json
170
+ ```
171
+
172
+ Full local backend replay:
173
+
174
+ ```bash
175
+ oomi openclaw debug tts-pipeline --text "When your voice reaches me, it gets turned into text, I read it and think about it, then I speak back through the managed chat session." --json
176
+ ```
177
+
178
+ Real Qwen provider replay:
179
+
180
+ ```bash
181
+ oomi openclaw debug tts-pipeline --text "When your voice reaches me, it gets turned into text, I read it and think about it, then I speak back through the managed chat session." --live-provider --env-file .env.local --provider-timeout-ms 20000 --json
182
+ ```
183
+
184
+ What a good result looks like:
185
+ - `backend.success = true`
186
+ - `managed.assistantSpeechFinal.present = true`
187
+ - `qwen.errorCode = null`
188
+ - `qwen.audioDeltaCount > 0` when `--live-provider` is used
189
+
190
+ This is the preferred pre-publish gate for managed voice regressions, because it is much faster than publishing to npm and testing through a live OpenClaw machine first.
191
+
151
192
  ## Persona Scaffolding
152
193
 
153
194
  Use the scaffold flow when OpenClaw needs to build a managed persona app that will live inside Oomi:
@@ -192,7 +233,7 @@ The bridge status file is written locally and should roughly be interpreted as:
192
233
 
193
234
  For voice support, a `voice_session_*` failure should be treated as narrower than a full provider outage.
194
235
 
195
- ## Troubleshooting
236
+ ## Troubleshooting
196
237
 
197
238
  ### `invalid handshake: first request must be connect`
198
239
 
@@ -223,19 +264,32 @@ What to check:
223
264
 
224
265
  If the process is alive but runtime faults are being caught, expect `degraded` rather than an immediate hard stop.
225
266
 
226
- ### Voice STT works but the agent does not answer
267
+ ### Voice STT works but the agent does not answer
227
268
 
228
269
  This usually means one of these:
229
270
  - the managed gateway/device side is not actually ready
230
271
  - the bridge or agent run failed after delivery
231
272
  - the OpenClaw run stopped with an upstream provider `network_error`
232
273
 
233
- In that situation, inspect:
234
- - `~/.openclaw/logs/gateway.log`
235
- - `~/.openclaw/logs/gateway.err.log`
236
- - the relevant session JSONL in `~/.openclaw/agents/main/sessions/`
274
+ In that situation, inspect:
275
+ - `~/.openclaw/logs/gateway.log`
276
+ - `~/.openclaw/logs/gateway.err.log`
277
+ - the relevant session JSONL in `~/.openclaw/agents/main/sessions/`
278
+
279
+ ### Voice text works but cloned TTS fails with `MISSING_SPOKEN_METADATA`
280
+
281
+ Meaning:
282
+ - the assistant text arrived
283
+ - the backend voice relay never received valid hidden `metadata.spoken`
284
+
285
+ What to check:
286
+ - run the local replay gate before publishing:
287
+ - `oomi openclaw debug assistant-final --text "..."`
288
+ - `oomi openclaw debug tts-pipeline --text "..."`
289
+ - if the package local replay succeeds but the live machine fails, verify the OpenClaw machine is actually running the updated bridge binary
290
+ - if the local replay fails, fix the assistant-final contract first instead of debugging the browser or backend deployment
237
291
 
238
- ## Developer Notes
292
+ ## Developer Notes
239
293
 
240
294
  If you are inspecting this package on npm, the main architectural points are:
241
295
  - the extension path is the stable managed text contract
@@ -248,24 +302,38 @@ If you are inspecting this package on npm, the main architectural points are:
248
302
  - runtime fault isolation so local session failures are less likely to crash the whole provider
249
303
  - one shared hidden managed-voice speech metadata helper used by both the extension and the local bridge
250
304
 
251
- If you are developing the plugin, test the packaged surface with:
252
-
253
- ```bash
254
- cd packages/oomi-ai
255
- node --test test/*.test.mjs
256
- npm pack --dry-run
257
- ```
258
-
259
- ## Release Process
260
-
261
- Before publishing:
262
-
263
- ```bash
264
- cd packages/oomi-ai
265
- node --test test/*.test.mjs
266
- npm pack --dry-run
267
- ```
268
-
305
+ If you are developing the plugin, test the packaged surface with:
306
+
307
+ ```bash
308
+ cd packages/oomi-ai
309
+ node --test test/*.test.mjs
310
+ npm pack --dry-run
311
+ ```
312
+
313
+ For managed voice changes, do not stop at the package tests. Run the local replay gate from the repo root as well, especially before publishing:
314
+
315
+ ```bash
316
+ oomi openclaw debug tts-pipeline --text "Local managed voice validation text." --json
317
+ oomi openclaw debug tts-pipeline --text "Local managed voice validation text." --live-provider --env-file .env.local --provider-timeout-ms 20000 --json
318
+ ```
319
+
320
+ ## Release Process
321
+
322
+ Before publishing:
323
+
324
+ ```bash
325
+ cd packages/oomi-ai
326
+ node --test test/*.test.mjs
327
+ npm pack --dry-run
328
+ ```
329
+
330
+ For voice-related changes, also run the repo-backed local replay gate before publish:
331
+
332
+ ```bash
333
+ oomi openclaw debug tts-pipeline --text "Local managed voice validation text." --json
334
+ oomi openclaw debug tts-pipeline --text "Local managed voice validation text." --live-provider --env-file .env.local --provider-timeout-ms 20000 --json
335
+ ```
336
+
269
337
  Then publish the bumped version:
270
338
 
271
339
  ```bash
package/bin/oomi-ai.js CHANGED
@@ -39,13 +39,21 @@ const BRIDGE_CONNECT_CHALLENGE_TIMEOUT_MS = parsePositiveInteger(
39
39
  process.env.OOMI_BRIDGE_CONNECT_CHALLENGE_TIMEOUT_MS,
40
40
  3000
41
41
  );
42
- const BRIDGE_GATEWAY_REQUEST_TIMEOUT_MS = parsePositiveInteger(
43
- process.env.OOMI_BRIDGE_GATEWAY_REQUEST_TIMEOUT_MS,
44
- 30000
45
- );
46
- const BRIDGE_LAUNCHD_LABEL = 'ai.oomi.bridge';
47
- const DEVICE_IDENTITY_PATH = path.join(os.homedir(), '.openclaw', 'identity', 'device.json');
48
- const ED25519_SPKI_PREFIX = Buffer.from('302a300506032b6570032100', 'hex');
42
+ const BRIDGE_GATEWAY_REQUEST_TIMEOUT_MS = parsePositiveInteger(
43
+ process.env.OOMI_BRIDGE_GATEWAY_REQUEST_TIMEOUT_MS,
44
+ 30000
45
+ );
46
+ const BRIDGE_LAUNCHD_LABEL = 'ai.oomi.bridge';
47
+ const DEBUG_PROVIDER_ENV_KEYS = [
48
+ 'QWEN_REALTIME_API_KEY',
49
+ 'QWEN_REALTIME_BASE_URL',
50
+ 'QWEN_REALTIME_ASR_MODEL',
51
+ 'QWEN_REALTIME_TTS_MODEL',
52
+ 'QWEN_REALTIME_TTS_VOICE',
53
+ 'QWEN_REALTIME_LANGUAGE',
54
+ ];
55
+ const DEVICE_IDENTITY_PATH = path.join(os.homedir(), '.openclaw', 'identity', 'device.json');
56
+ const ED25519_SPKI_PREFIX = Buffer.from('302a300506032b6570032100', 'hex');
49
57
 
50
58
  function parsePositiveInteger(value, fallback) {
51
59
  const num = Number(value);
@@ -169,10 +177,14 @@ Commands:
169
177
  openclaw install
170
178
  Install agent instructions and the Oomi skill into OpenClaw.
171
179
 
172
- openclaw bridge [start|ensure|stop|restart|ps]
173
- Manage local OpenClaw-to-Oomi bridge lifecycle (singleton).
174
- openclaw bridge service [install|start|stop|restart|status|uninstall]
175
- Manage macOS launchd bridge supervision.
180
+ openclaw bridge [start|ensure|stop|restart|ps]
181
+ Manage local OpenClaw-to-Oomi bridge lifecycle (singleton).
182
+ openclaw bridge service [install|start|stop|restart|status|uninstall]
183
+ Manage macOS launchd bridge supervision.
184
+ openclaw debug assistant-final
185
+ Replay an assistant chat.final frame through spoken-metadata normalization.
186
+ openclaw debug tts-pipeline
187
+ Replay an assistant chat.final through local backend voice handling.
176
188
 
177
189
  openclaw pair
178
190
  Pair this OpenClaw host with Oomi and start bridge (single command).
@@ -225,9 +237,20 @@ Common flags:
225
237
  --device-id ID Bridge device identifier (default: host name)
226
238
  --device-token TOKEN Existing bridge device token
227
239
  --show-secrets Print full token values in diagnostic output
228
- --json Print pairing result as JSON (for automation)
229
- --backend-url URL Override Oomi backend URL
230
- --root PATH Override repo root path for persona discovery
240
+ --json Print pairing result as JSON (for automation)
241
+ --text TEXT Assistant text for local debug frame replay
242
+ --frame-file PATH Read a raw gateway frame from disk for local debug replay
243
+ --frame-json JSON Use raw gateway frame JSON text for local debug replay
244
+ --session-id ID Debug session id override (default: ms_debug_local)
245
+ --user-text TEXT User utterance text used for backend voice replay
246
+ --live-provider Use the real Qwen TTS provider in local debug replay
247
+ --env-file PATH Load provider env vars from a specific env file (default: <repo>/.env.local)
248
+ --provider-timeout-ms N
249
+ Timeout in ms for live provider audio during local debug replay
250
+ --backend-url URL Override Oomi backend URL
251
+ --root PATH Override repo root path for persona discovery
252
+ --role ROLE Message role override for local debug frame replay
253
+ --omit-role Omit message.role in the generated local debug frame
231
254
  --name NAME Persona display name (for create)
232
255
  --description TEXT Persona description (for scaffold)
233
256
  --slug SLUG Explicit slug override (for create-managed)
@@ -261,13 +284,43 @@ function readFile(filePath) {
261
284
  return fs.readFileSync(filePath, 'utf-8');
262
285
  }
263
286
 
264
- function writeFile(filePath, content, options = undefined) {
265
- fs.writeFileSync(filePath, content, options);
266
- }
267
-
268
- function xmlEscape(value) {
269
- return String(value)
270
- .replaceAll('&', '&amp;')
287
+ function writeFile(filePath, content, options = undefined) {
288
+ fs.writeFileSync(filePath, content, options);
289
+ }
290
+
291
+ function parseDotEnvLine(line) {
292
+ const trimmed = String(line || '').trim();
293
+ if (!trimmed || trimmed.startsWith('#')) return null;
294
+ const separatorIndex = trimmed.indexOf('=');
295
+ if (separatorIndex <= 0) return null;
296
+ const key = trimmed.slice(0, separatorIndex).trim();
297
+ if (!key) return null;
298
+ let value = trimmed.slice(separatorIndex + 1).trim();
299
+ if ((value.startsWith('"') && value.endsWith('"')) || (value.startsWith("'") && value.endsWith("'"))) {
300
+ value = value.slice(1, -1);
301
+ }
302
+ return { key, value };
303
+ }
304
+
305
+ function loadEnvFile(filePath, keys = []) {
306
+ if (!filePath || !fs.existsSync(filePath)) {
307
+ throw new Error(`Environment file not found: ${filePath}`);
308
+ }
309
+ const selectedKeys = Array.isArray(keys) && keys.length ? new Set(keys) : null;
310
+ const entries = {};
311
+ const lines = readFile(filePath).split(/\r?\n/);
312
+ for (const line of lines) {
313
+ const parsed = parseDotEnvLine(line);
314
+ if (!parsed) continue;
315
+ if (selectedKeys && !selectedKeys.has(parsed.key)) continue;
316
+ entries[parsed.key] = parsed.value;
317
+ }
318
+ return entries;
319
+ }
320
+
321
+ function xmlEscape(value) {
322
+ return String(value)
323
+ .replaceAll('&', '&amp;')
271
324
  .replaceAll('<', '&lt;')
272
325
  .replaceAll('>', '&gt;')
273
326
  .replaceAll('"', '&quot;')
@@ -356,9 +409,9 @@ function ensureDir(dirPath) {
356
409
  }
357
410
  }
358
411
 
359
- function findRepoRoot(startDir) {
360
- let current = startDir;
361
- for (let i = 0; i < 6; i += 1) {
412
+ function findRepoRoot(startDir) {
413
+ let current = startDir;
414
+ for (let i = 0; i < 6; i += 1) {
362
415
  const personasDir = path.join(current, 'personas');
363
416
  const skillsDir = path.join(current, 'skills', 'oomi');
364
417
  if (fs.existsSync(personasDir) || fs.existsSync(skillsDir)) {
@@ -367,11 +420,23 @@ function findRepoRoot(startDir) {
367
420
  const parent = path.dirname(current);
368
421
  if (parent === current) break;
369
422
  current = parent;
370
- }
371
- return null;
372
- }
373
-
374
- function resolveSkillSource(cliRoot) {
423
+ }
424
+ return null;
425
+ }
426
+
427
+ function resolveRepoRoot(rootFlag) {
428
+ const explicitRoot =
429
+ typeof rootFlag === 'string' && rootFlag.trim()
430
+ ? path.resolve(rootFlag.trim())
431
+ : '';
432
+ const repoRoot = explicitRoot || findRepoRoot(process.cwd()) || findRepoRoot(PACKAGE_ROOT);
433
+ if (!repoRoot) {
434
+ throw new Error('Could not locate repo root. Use --root <repo root>.');
435
+ }
436
+ return repoRoot;
437
+ }
438
+
439
+ function resolveSkillSource(cliRoot) {
375
440
  const packaged = path.join(PACKAGE_ROOT, 'skills', 'oomi');
376
441
  if (fs.existsSync(packaged)) {
377
442
  return packaged;
@@ -1698,7 +1763,7 @@ function summarizeVoiceFrameContract(frameText) {
1698
1763
  };
1699
1764
  }
1700
1765
 
1701
- function ensureVoiceAssistantSpokenMetadata(frameText) {
1766
+ function ensureAssistantSpokenMetadata(frameText) {
1702
1767
  const frame = parseJsonPayload(frameText);
1703
1768
  if (!frame || typeof frame !== 'object') {
1704
1769
  return { frameText, changed: false, reason: '' };
@@ -1753,6 +1818,395 @@ function ensureVoiceAssistantSpokenMetadata(frameText) {
1753
1818
  reason: normalizedExplicitSpoken ? 'normalized' : (messageRole ? 'synthesized' : 'synthesized_missing_role'),
1754
1819
  };
1755
1820
  }
1821
+
1822
+ function normalizeAssistantGatewayFrame(sessionId, frameText) {
1823
+ const scope = classifyBridgeSessionScope(sessionId);
1824
+ const summary = summarizeVoiceFrameContract(frameText);
1825
+ if (!summary.parseable || summary.event !== 'chat' || summary.state !== 'final') {
1826
+ return {
1827
+ frameText,
1828
+ changed: false,
1829
+ reason: '',
1830
+ scope,
1831
+ summary,
1832
+ };
1833
+ }
1834
+
1835
+ const normalized = ensureAssistantSpokenMetadata(frameText);
1836
+ return {
1837
+ ...normalized,
1838
+ scope,
1839
+ summary,
1840
+ };
1841
+ }
1842
+
1843
+ function buildAssistantFinalDebugFrame({ sessionKey, text, role }) {
1844
+ const trimmedSessionKey =
1845
+ typeof sessionKey === 'string' && sessionKey.trim()
1846
+ ? sessionKey.trim()
1847
+ : 'agent:main:webchat:channel:oomi';
1848
+ const message = {
1849
+ content: String(text || ''),
1850
+ };
1851
+ if (typeof role === 'string' && role.trim()) {
1852
+ message.role = role.trim();
1853
+ }
1854
+ return JSON.stringify({
1855
+ type: 'event',
1856
+ event: 'chat',
1857
+ payload: {
1858
+ sessionKey: trimmedSessionKey,
1859
+ state: 'final',
1860
+ message,
1861
+ },
1862
+ });
1863
+ }
1864
+
1865
+ function extractSpokenMetadata(frameText) {
1866
+ const payload = parseJsonPayload(frameText);
1867
+ const message =
1868
+ payload &&
1869
+ payload.payload &&
1870
+ typeof payload.payload === 'object' &&
1871
+ payload.payload.message &&
1872
+ typeof payload.payload.message === 'object'
1873
+ ? payload.payload.message
1874
+ : null;
1875
+ const metadata =
1876
+ message &&
1877
+ message.metadata &&
1878
+ typeof message.metadata === 'object' &&
1879
+ !Array.isArray(message.metadata)
1880
+ ? message.metadata
1881
+ : {};
1882
+ return normalizeSpokenMetadata(metadata.spoken);
1883
+ }
1884
+
1885
+ function runAssistantFinalDebugCheck(options = {}) {
1886
+ const sessionId =
1887
+ typeof options.sessionId === 'string' && options.sessionId.trim()
1888
+ ? options.sessionId.trim()
1889
+ : 'ms_debug_local';
1890
+ const sessionKey =
1891
+ typeof options.sessionKey === 'string' && options.sessionKey.trim()
1892
+ ? options.sessionKey.trim()
1893
+ : 'agent:main:webchat:channel:oomi';
1894
+ const role =
1895
+ options.omitRole
1896
+ ? ''
1897
+ : (typeof options.role === 'string' && options.role.trim() ? options.role.trim() : 'assistant');
1898
+
1899
+ const rawFrameText =
1900
+ typeof options.frameText === 'string' && options.frameText.trim()
1901
+ ? options.frameText
1902
+ : buildAssistantFinalDebugFrame({
1903
+ sessionKey,
1904
+ text: options.text,
1905
+ role,
1906
+ });
1907
+
1908
+ const before = summarizeVoiceFrameContract(rawFrameText);
1909
+ const normalized = normalizeAssistantGatewayFrame(sessionId, rawFrameText);
1910
+ const after = summarizeVoiceFrameContract(normalized.frameText);
1911
+ const spoken = extractSpokenMetadata(normalized.frameText);
1912
+
1913
+ return {
1914
+ sessionId,
1915
+ sessionKey,
1916
+ scope: normalized.scope,
1917
+ changed: normalized.changed,
1918
+ reason: normalized.reason,
1919
+ before,
1920
+ after,
1921
+ spoken,
1922
+ frameText: normalized.frameText,
1923
+ };
1924
+ }
1925
+
1926
+ function printAssistantFinalDebugResult(result, asJson) {
1927
+ if (asJson) {
1928
+ console.log(JSON.stringify(result, null, 2));
1929
+ return;
1930
+ }
1931
+
1932
+ console.log(`Session id: ${result.sessionId}`);
1933
+ console.log(`Session key: ${result.sessionKey}`);
1934
+ console.log(`Scope: ${result.scope}`);
1935
+ console.log(`Changed: ${result.changed ? 'yes' : 'no'}${result.reason ? ` (${result.reason})` : ''}`);
1936
+ console.log(
1937
+ `Before: event=${result.before.event || '<none>'} state=${result.before.state || '<none>'} role=${result.before.role || '<none>'} spoken=${result.before.spokenNormalized ? 'yes' : 'no'}`
1938
+ );
1939
+ console.log(
1940
+ `After: event=${result.after.event || '<none>'} state=${result.after.state || '<none>'} role=${result.after.role || '<none>'} spoken=${result.after.spokenNormalized ? 'yes' : 'no'}`
1941
+ );
1942
+ if (result.spoken) {
1943
+ console.log(`Spoken text: ${result.spoken.text}`);
1944
+ console.log(`Segments: ${Array.isArray(result.spoken.segments) ? result.spoken.segments.length : 0}`);
1945
+ if (typeof result.spoken.instructions === 'string' && result.spoken.instructions.trim()) {
1946
+ console.log(`Instructions: ${result.spoken.instructions}`);
1947
+ }
1948
+ } else {
1949
+ console.log('Spoken text: <missing>');
1950
+ }
1951
+ }
1952
+
1953
+ function resolveCommandFromPath(commandName) {
1954
+ const normalized = String(commandName || '').trim();
1955
+ if (!normalized) return '';
1956
+ try {
1957
+ const probe = spawnSync(process.platform === 'win32' ? 'where' : 'which', [normalized], {
1958
+ encoding: 'utf8',
1959
+ stdio: ['ignore', 'pipe', 'ignore'],
1960
+ });
1961
+ if (probe.status !== 0) return '';
1962
+ const firstLine = String(probe.stdout || '')
1963
+ .split(/\r?\n/)
1964
+ .map((line) => line.trim())
1965
+ .find(Boolean);
1966
+ return firstLine || '';
1967
+ } catch {
1968
+ return '';
1969
+ }
1970
+ }
1971
+
1972
+ function resolveExecutable(candidates = []) {
1973
+ for (const candidate of candidates) {
1974
+ if (!candidate) continue;
1975
+ const value = String(candidate).trim();
1976
+ if (!value) continue;
1977
+ if (path.isAbsolute(value) && fs.existsSync(value)) {
1978
+ return value;
1979
+ }
1980
+ if (value.includes(path.sep) || value.includes('/')) {
1981
+ const resolved = path.resolve(value);
1982
+ if (fs.existsSync(resolved)) {
1983
+ return resolved;
1984
+ }
1985
+ continue;
1986
+ }
1987
+ const fromPath = resolveCommandFromPath(value);
1988
+ if (fromPath) {
1989
+ return fromPath;
1990
+ }
1991
+ }
1992
+ return '';
1993
+ }
1994
+
1995
+ function resolveBackendRoot(rootFlag) {
1996
+ const repoRoot = resolveRepoRoot(rootFlag);
1997
+ const backendRoot = path.join(repoRoot, 'apps', 'backend');
1998
+ if (!fs.existsSync(backendRoot)) {
1999
+ throw new Error(`Could not locate backend app at ${backendRoot}`);
2000
+ }
2001
+ return backendRoot;
2002
+ }
2003
+
2004
+ function resolveRubyExecutable() {
2005
+ const candidates = [
2006
+ process.env.OOMI_RUBY_BIN,
2007
+ process.env.RUBY,
2008
+ process.platform === 'win32' ? 'ruby.exe' : 'ruby',
2009
+ process.platform === 'win32' ? 'ruby' : '',
2010
+ process.platform === 'win32' ? 'C:\\Ruby33-x64\\bin\\ruby.exe' : '',
2011
+ ];
2012
+ const executable = resolveExecutable(candidates);
2013
+ if (!executable) {
2014
+ throw new Error('Ruby executable not found. Set OOMI_RUBY_BIN or install Ruby locally.');
2015
+ }
2016
+ return executable;
2017
+ }
2018
+
2019
+ function resolveBundleExecutable() {
2020
+ const candidates = [
2021
+ process.env.OOMI_BUNDLE_BIN,
2022
+ process.platform === 'win32' ? 'bundle.bat' : 'bundle',
2023
+ 'bundle',
2024
+ process.platform === 'win32' ? 'C:\\Ruby33-x64\\bin\\bundle.bat' : '',
2025
+ ];
2026
+ const executable = resolveExecutable(candidates);
2027
+ if (!executable) {
2028
+ throw new Error('Bundler executable not found. Set OOMI_BUNDLE_BIN or install Bundler locally.');
2029
+ }
2030
+ return executable;
2031
+ }
2032
+
2033
+ function shellQuote(value) {
2034
+ const text = String(value);
2035
+ if (process.platform === 'win32') {
2036
+ return `"${text.replace(/"/g, '""')}"`;
2037
+ }
2038
+ return `'${text.replace(/'/g, `'\\''`)}'`;
2039
+ }
2040
+
2041
+ async function runBundledRubyScript({ backendRoot, scriptPath, inputFile, env = undefined }) {
2042
+ const rubyExecutable = resolveRubyExecutable();
2043
+ const bundleExecutable = resolveBundleExecutable();
2044
+ const commandText = process.platform === 'win32'
2045
+ ? [bundleExecutable, 'exec', rubyExecutable, scriptPath, '--input-file', inputFile].map(shellQuote).join(' ')
2046
+ : '';
2047
+ const childEnv = env ? { ...process.env, ...env } : process.env;
2048
+
2049
+ return await new Promise((resolve, reject) => {
2050
+ const child = process.platform === 'win32'
2051
+ ? spawn(commandText, [], {
2052
+ cwd: backendRoot,
2053
+ shell: true,
2054
+ env: childEnv,
2055
+ stdio: ['ignore', 'pipe', 'pipe'],
2056
+ })
2057
+ : spawn(bundleExecutable, ['exec', rubyExecutable, scriptPath, '--input-file', inputFile], {
2058
+ cwd: backendRoot,
2059
+ env: childEnv,
2060
+ stdio: ['ignore', 'pipe', 'pipe'],
2061
+ });
2062
+
2063
+ let stdout = '';
2064
+ let stderr = '';
2065
+ child.stdout.on('data', (chunk) => {
2066
+ stdout += chunk.toString();
2067
+ });
2068
+ child.stderr.on('data', (chunk) => {
2069
+ stderr += chunk.toString();
2070
+ });
2071
+ child.on('error', reject);
2072
+ child.on('close', (code) => {
2073
+ resolve({ code: Number(code || 0), stdout, stderr });
2074
+ });
2075
+ });
2076
+ }
2077
+
2078
+ async function runLocalTtsPipelineDebugCheck(options = {}) {
2079
+ const assistant = runAssistantFinalDebugCheck(options);
2080
+ const repoRoot = resolveRepoRoot(options.root);
2081
+ const backendRoot = resolveBackendRoot(options.root);
2082
+ const scriptPath = path.join(backendRoot, 'bin', 'voice_tts_replay.rb');
2083
+ if (!fs.existsSync(scriptPath)) {
2084
+ throw new Error(`Backend replay script not found: ${scriptPath}`);
2085
+ }
2086
+
2087
+ const inputPayload = {
2088
+ repoRoot,
2089
+ sessionId: assistant.sessionId,
2090
+ sessionKey: assistant.sessionKey,
2091
+ frameText: assistant.frameText,
2092
+ userText:
2093
+ typeof options.userText === 'string' && options.userText.trim()
2094
+ ? options.userText.trim()
2095
+ : 'local debug utterance',
2096
+ liveProvider: Boolean(options.liveProvider),
2097
+ providerTimeoutMs: parsePositiveInteger(options.providerTimeoutMs, 15000),
2098
+ };
2099
+ let childEnv = undefined;
2100
+ let resolvedEnvFile = '';
2101
+ if (options.liveProvider) {
2102
+ resolvedEnvFile =
2103
+ typeof options.envFile === 'string' && options.envFile.trim()
2104
+ ? path.resolve(options.envFile.trim())
2105
+ : path.join(repoRoot, '.env.local');
2106
+ childEnv = loadEnvFile(resolvedEnvFile, DEBUG_PROVIDER_ENV_KEYS);
2107
+ }
2108
+ const inputFile = path.join(os.tmpdir(), `oomi-voice-replay-${randomUUID()}.json`);
2109
+ writeFile(inputFile, JSON.stringify(inputPayload, null, 2) + '\n');
2110
+
2111
+ try {
2112
+ const backend = await runBundledRubyScript({ backendRoot, scriptPath, inputFile, env: childEnv });
2113
+ const parsed = backend.stdout.trim() ? JSON.parse(backend.stdout) : null;
2114
+ return {
2115
+ assistant,
2116
+ backend: parsed,
2117
+ backendExitCode: backend.code,
2118
+ backendStderr: backend.stderr.trim(),
2119
+ liveProvider: Boolean(options.liveProvider),
2120
+ envFile: resolvedEnvFile || null,
2121
+ };
2122
+ } finally {
2123
+ try {
2124
+ fs.unlinkSync(inputFile);
2125
+ } catch {
2126
+ // no-op
2127
+ }
2128
+ }
2129
+ }
2130
+
2131
+ function printTtsPipelineDebugResult(result, asJson) {
2132
+ if (asJson) {
2133
+ console.log(JSON.stringify(result, null, 2));
2134
+ return;
2135
+ }
2136
+
2137
+ console.log(`Assistant normalization: ${result.assistant.changed ? 'changed' : 'unchanged'}${result.assistant.reason ? ` (${result.assistant.reason})` : ''}`);
2138
+ console.log(`Assistant spoken segments: ${Array.isArray(result.assistant.spoken?.segments) ? result.assistant.spoken.segments.length : 0}`);
2139
+ if (!result.backend) {
2140
+ console.log('Backend replay: <no output>');
2141
+ return;
2142
+ }
2143
+ console.log(`Backend replay success: ${result.backend.success ? 'yes' : 'no'}`);
2144
+ console.log(`Managed speech sidecar: ${result.backend.managed?.assistantSpeechFinal?.present ? 'yes' : 'no'}`);
2145
+ console.log(`Backend final text: ${result.backend.qwen?.assistantTextFinal || '<missing>'}`);
2146
+ console.log(`Backend TTS appends: ${Array.isArray(result.backend.qwen?.ttsAppends) ? result.backend.qwen.ttsAppends.length : 0}`);
2147
+ console.log(`Backend TTS commits: ${Number(result.backend.qwen?.commitCount || 0)}`);
2148
+ if (result.liveProvider) {
2149
+ console.log(`Live provider audio deltas: ${Number(result.backend.qwen?.audioDeltaCount || 0)}`);
2150
+ console.log(`Live provider audio bytes (base64): ${Number(result.backend.qwen?.audioDeltaBytes || 0)}`);
2151
+ console.log(`Live provider timeout: ${result.backend.qwen?.providerTimedOut ? 'yes' : 'no'}`);
2152
+ }
2153
+ if (result.backend.qwen?.errorCode) {
2154
+ console.log(`Backend error: ${result.backend.qwen.errorCode}`);
2155
+ }
2156
+ if (result.backendStderr) {
2157
+ console.log(`Backend stderr: ${result.backendStderr}`);
2158
+ }
2159
+ }
2160
+
2161
+ async function handleOpenclawDebugCommand(action, flags) {
2162
+ const normalizedAction = String(action || '').trim().toLowerCase();
2163
+ const frameFile =
2164
+ typeof flags['frame-file'] === 'string' && flags['frame-file'].trim()
2165
+ ? path.resolve(flags['frame-file'])
2166
+ : '';
2167
+ const frameText =
2168
+ frameFile
2169
+ ? readFile(frameFile)
2170
+ : (typeof flags['frame-json'] === 'string' && flags['frame-json'].trim() ? flags['frame-json'] : '');
2171
+ const text = typeof flags.text === 'string' ? flags.text : '';
2172
+
2173
+ if (!frameText && !text.trim()) {
2174
+ throw new Error(
2175
+ 'Assistant text or frame input is required. Usage: oomi openclaw debug assistant-final --text "<assistant text>"'
2176
+ );
2177
+ }
2178
+
2179
+ const debugOptions = {
2180
+ sessionId: flags['session-id'],
2181
+ sessionKey: flags['session-key'],
2182
+ role: flags.role,
2183
+ omitRole: isTruthyFlag(flags['omit-role']),
2184
+ text,
2185
+ frameText,
2186
+ root: flags.root,
2187
+ userText: flags['user-text'],
2188
+ liveProvider: isTruthyFlag(flags['live-provider']),
2189
+ envFile: flags['env-file'],
2190
+ providerTimeoutMs: flags['provider-timeout-ms'],
2191
+ };
2192
+
2193
+ if (normalizedAction === 'assistant-final') {
2194
+ const result = runAssistantFinalDebugCheck(debugOptions);
2195
+ printAssistantFinalDebugResult(result, isTruthyFlag(flags.json));
2196
+ return;
2197
+ }
2198
+
2199
+ if (normalizedAction === 'tts-pipeline') {
2200
+ const result = await runLocalTtsPipelineDebugCheck(debugOptions);
2201
+ printTtsPipelineDebugResult(result, isTruthyFlag(flags.json));
2202
+ if (!result.backend?.success) {
2203
+ throw new Error(result.backend?.qwen?.errorCode || 'Local backend TTS replay failed.');
2204
+ }
2205
+ return;
2206
+ }
2207
+
2208
+ throw new Error('Unknown debug action: ' + normalizedAction + '. Use: oomi openclaw debug assistant-final|tts-pipeline');
2209
+ }
1756
2210
 
1757
2211
  function extractCorrelationId(params) {
1758
2212
  if (!params || typeof params !== 'object') return '';
@@ -2987,18 +3441,17 @@ async function startOpenclawBridge(flags) {
2987
3441
 
2988
3442
  gatewaySocket.on('message', runBridgeCallbackSafely((gatewayRaw) => {
2989
3443
  let frame = typeof gatewayRaw === 'string' ? gatewayRaw : gatewayRaw.toString();
2990
- if (classifyBridgeSessionScope(sessionId) === 'voice') {
2991
- const beforeSummary = summarizeVoiceFrameContract(frame);
2992
- const spokenNormalized = ensureVoiceAssistantSpokenMetadata(frame);
2993
- if (spokenNormalized.changed) {
2994
- frame = spokenNormalized.frameText;
3444
+ const spokenNormalized = normalizeAssistantGatewayFrame(sessionId, frame);
3445
+ if (spokenNormalized.changed) {
3446
+ frame = spokenNormalized.frameText;
3447
+ if (spokenNormalized.scope === 'voice') {
2995
3448
  console.log(`[bridge] voice.spoken_metadata.${spokenNormalized.reason} ${sessionId} ${JSON.stringify({
2996
- before: beforeSummary,
3449
+ before: spokenNormalized.summary,
2997
3450
  after: summarizeVoiceFrameContract(frame),
2998
3451
  })}`);
2999
- } else if (beforeSummary.event === 'chat' && beforeSummary.state === 'final') {
3000
- console.log(`[bridge] voice.chat.final ${sessionId} ${JSON.stringify(beforeSummary)}`);
3001
3452
  }
3453
+ } else if (spokenNormalized.scope === 'voice' && spokenNormalized.summary.event === 'chat' && spokenNormalized.summary.state === 'final') {
3454
+ console.log(`[bridge] voice.chat.final ${sessionId} ${JSON.stringify(spokenNormalized.summary)}`);
3002
3455
  }
3003
3456
  const gatewayPayload = parseJsonPayload(frame);
3004
3457
  if (gatewayPayload?.event === 'connect.challenge') {
@@ -4139,10 +4592,15 @@ async function main() {
4139
4592
  return;
4140
4593
  }
4141
4594
 
4142
- if (command === 'openclaw' && subcommand === 'plugin') {
4143
- printOpenclawPluginSetup(args.flags);
4144
- return;
4145
- }
4595
+ if (command === 'openclaw' && subcommand === 'plugin') {
4596
+ printOpenclawPluginSetup(args.flags);
4597
+ return;
4598
+ }
4599
+
4600
+ if (command === 'openclaw' && subcommand === 'debug') {
4601
+ await handleOpenclawDebugCommand(args.positionals[0], args.flags);
4602
+ return;
4603
+ }
4146
4604
 
4147
4605
  if (command === 'personas' && subcommand === 'sync') {
4148
4606
  await syncPersonas({ backendUrl: args.flags['backend-url'], root: args.flags.root });
@@ -4257,7 +4715,9 @@ if (__isDirectExecution) {
4257
4715
 
4258
4716
  export {
4259
4717
  prepareGatewayFrameForLocalGateway,
4260
- ensureVoiceAssistantSpokenMetadata,
4718
+ ensureAssistantSpokenMetadata,
4719
+ normalizeAssistantGatewayFrame,
4720
+ runAssistantFinalDebugCheck,
4261
4721
  buildBridgeLaunchAgentPlist,
4262
4722
  classifyBridgeFailure,
4263
4723
  classifyBridgeSessionScope,
@@ -1,6 +1,10 @@
1
- function trimString(value, fallback = '') {
2
- return typeof value === 'string' && value.trim() ? value.trim() : fallback;
3
- }
1
+ function trimString(value, fallback = '') {
2
+ return typeof value === 'string' && value.trim() ? value.trim() : fallback;
3
+ }
4
+
5
+ function stripAvatarCommandTags(text) {
6
+ return text.replace(/\[(anim|animation|face|expression|emotion|gesture|look|gaze):[^\]]+\]/gi, ' ');
7
+ }
4
8
 
5
9
  function clampInteger(value, fallback, { min = 1, max = Number.MAX_SAFE_INTEGER } = {}) {
6
10
  if (typeof value !== 'number' || !Number.isFinite(value)) return fallback;
@@ -35,11 +39,11 @@ function inferSpokenLanguage(text) {
35
39
  return 'English';
36
40
  }
37
41
 
38
- function normalizeSpokenSegment(segment) {
39
- if (!segment || typeof segment !== 'object' || Array.isArray(segment)) return null;
40
-
41
- const text = trimString(segment.text);
42
- if (!text) return null;
42
+ function normalizeSpokenSegment(segment) {
43
+ if (!segment || typeof segment !== 'object' || Array.isArray(segment)) return null;
44
+
45
+ const text = normalizeSpeechText(trimString(segment.text));
46
+ if (!text) return null;
43
47
 
44
48
  const normalized = { text };
45
49
  const pace = trimString(segment.pace);
@@ -61,11 +65,11 @@ function stripEmoji(text) {
61
65
  return text.replace(/[\uFE0E\uFE0F]/g, '').replace(/\p{Extended_Pictographic}|\p{Emoji_Presentation}/gu, '');
62
66
  }
63
67
 
64
- function normalizeSpeechText(text) {
65
- return stripEmoji(text)
66
- .replace(/\*\*(.*?)\*\*/g, '$1')
67
- .replace(/__(.*?)__/g, '$1')
68
- .replace(/`([^`]+)`/g, '$1')
68
+ function normalizeSpeechText(text) {
69
+ return stripEmoji(stripAvatarCommandTags(text))
70
+ .replace(/\*\*(.*?)\*\*/g, '$1')
71
+ .replace(/__(.*?)__/g, '$1')
72
+ .replace(/`([^`]+)`/g, '$1')
69
73
  .replace(/[\u2013\u2014]/g, ', ')
70
74
  .replace(/\u2026/g, '...')
71
75
  .replace(/\s+/g, ' ')
@@ -76,14 +80,14 @@ function normalizeSpeechText(text) {
76
80
  .trim();
77
81
  }
78
82
 
79
- function splitSpeechSegments(text) {
80
- const normalized = normalizeSpeechText(text);
81
- if (!normalized) return [];
82
-
83
- const baseSegments = normalized
84
- .split(/(?<=[.!?])\s+/)
85
- .map((segment) => segment.trim())
86
- .filter(Boolean);
83
+ function splitSpeechSegments(text) {
84
+ const normalized = normalizeSpeechText(text);
85
+ if (!normalized) return [];
86
+
87
+ const baseSegments = normalized
88
+ .split(/(?<=[.!?])\s+|\n+/)
89
+ .map((segment) => segment.trim())
90
+ .filter(Boolean);
87
91
 
88
92
  const segments = [];
89
93
  for (const segment of baseSegments) {
@@ -92,19 +96,17 @@ function splitSpeechSegments(text) {
92
96
  continue;
93
97
  }
94
98
 
95
- const clauseParts = segment
96
- .split(/,\s+/)
97
- .map((part) => part.trim())
98
- .filter(Boolean);
99
-
100
- if (clauseParts.length > 1) {
101
- for (let index = 0; index < clauseParts.length; index += 1) {
102
- const part = clauseParts[index];
103
- const needsComma = index < clauseParts.length - 1 && !/[.!?]$/.test(part);
104
- segments.push(needsComma ? `${part},` : part);
105
- }
106
- continue;
107
- }
99
+ const clauseParts = segment
100
+ .split(/(?<=[,;:])\s+/)
101
+ .map((part) => part.trim())
102
+ .filter(Boolean);
103
+
104
+ if (clauseParts.length > 1) {
105
+ for (const part of clauseParts) {
106
+ segments.push(part);
107
+ }
108
+ continue;
109
+ }
108
110
 
109
111
  segments.push(segment);
110
112
  }
@@ -114,50 +116,62 @@ function splitSpeechSegments(text) {
114
116
  return [...segments.slice(0, 4), segments.slice(4).join(' ').trim()];
115
117
  }
116
118
 
117
- function inferSegmentStyle(segmentText, index, totalSegments) {
118
- const normalized = segmentText.toLowerCase();
119
- const exclamatory = /!/.test(segmentText) || /\b(hell yeah|awesome|amazing|stoked|love|perfect|great)\b/.test(normalized);
120
- const curious = /\?/.test(segmentText);
121
- const reflective =
122
- /\b(i think|i'm|i am|i've|i have|lately|right now|before this|each time|understand|it feels like)\b/.test(normalized) ||
123
- segmentText.length > 60;
124
-
125
- if (curious) {
126
- return {
127
- pace: 'medium',
128
- pitch: 'slightly_high',
129
- energy: 'warm',
130
- volume: 'normal',
131
- pause_after_ms: 0,
132
- };
133
- }
119
+ function inferSegmentStyle(segmentText, index, totalSegments) {
120
+ const normalized = segmentText.toLowerCase();
121
+ const greeting = /^(hey|hi|hello|yo)\b/.test(normalized);
122
+ const exclamatory = /!/.test(segmentText) || /\b(hell yeah|awesome|amazing|stoked|love|perfect|great)\b/.test(normalized);
123
+ const curious = /\?/.test(segmentText);
124
+ const reassuring = /\b(got it|no worries|all good|you'?re good|sounds good|totally|absolutely)\b/.test(normalized);
125
+ const reflective =
126
+ /\b(i think|i'm|i am|i've|i have|lately|right now|before this|each time|understand|it feels like)\b/.test(normalized) ||
127
+ segmentText.length > 60;
128
+
129
+ if (greeting || reassuring) {
130
+ return {
131
+ pace: 'medium_fast',
132
+ pitch: 'slightly_high',
133
+ energy: 'bright',
134
+ volume: 'projected',
135
+ pause_after_ms: index < totalSegments - 1 ? 180 : 0,
136
+ };
137
+ }
138
+
139
+ if (curious) {
140
+ return {
141
+ pace: 'medium',
142
+ pitch: 'slightly_high',
143
+ energy: 'warm',
144
+ volume: 'projected',
145
+ pause_after_ms: 0,
146
+ };
147
+ }
134
148
 
135
149
  if (exclamatory) {
136
150
  return {
137
- pace: 'medium_fast',
138
- pitch: 'slightly_high',
139
- energy: 'bright',
140
- volume: 'normal',
141
- pause_after_ms: index < totalSegments - 1 ? 220 : 0,
142
- };
143
- }
144
-
145
- if (reflective) {
146
- return {
147
- pace: 'medium',
148
- pitch: 'neutral',
149
- energy: 'warm',
150
- volume: 'normal',
151
- pause_after_ms: index < totalSegments - 1 ? 260 : 0,
152
- };
153
- }
154
-
155
- return {
156
- pace: 'medium',
157
- pitch: 'neutral',
158
- energy: 'warm',
159
- volume: 'normal',
160
- pause_after_ms: index < totalSegments - 1 ? 180 : 0,
151
+ pace: 'medium_fast',
152
+ pitch: 'slightly_high',
153
+ energy: 'bright',
154
+ volume: 'projected',
155
+ pause_after_ms: index < totalSegments - 1 ? 220 : 0,
156
+ };
157
+ }
158
+
159
+ if (reflective) {
160
+ return {
161
+ pace: 'slow',
162
+ pitch: 'slightly_low',
163
+ energy: 'warm',
164
+ volume: 'soft',
165
+ pause_after_ms: index < totalSegments - 1 ? 280 : 0,
166
+ };
167
+ }
168
+
169
+ return {
170
+ pace: 'medium',
171
+ pitch: 'slightly_high',
172
+ energy: 'warm',
173
+ volume: 'normal',
174
+ pause_after_ms: index < totalSegments - 1 ? 180 : 0,
161
175
  };
162
176
  }
163
177
 
@@ -177,11 +191,11 @@ function synthesizeSpokenSegments(text) {
177
191
  };
178
192
  }
179
193
 
180
- function normalizeSpokenMetadata(spoken) {
181
- if (!spoken || typeof spoken !== 'object' || Array.isArray(spoken)) return null;
182
-
183
- const text = trimString(spoken.text);
184
- if (!text) return null;
194
+ function normalizeSpokenMetadata(spoken) {
195
+ if (!spoken || typeof spoken !== 'object' || Array.isArray(spoken)) return null;
196
+
197
+ const text = normalizeSpeechText(trimString(spoken.text));
198
+ if (!text) return null;
185
199
 
186
200
  const normalized = { text };
187
201
  const language = trimString(spoken.language);
@@ -214,10 +228,10 @@ function normalizeSpokenMetadata(spoken) {
214
228
  return normalized;
215
229
  }
216
230
 
217
- function inferSpokenMetadataFromContent(content) {
218
- const text = normalizeSpeechText(trimString(content));
219
- if (!text) return null;
220
- const synthesized = synthesizeSpokenSegments(text);
231
+ function inferSpokenMetadataFromContent(content) {
232
+ const text = normalizeSpeechText(trimString(content));
233
+ if (!text) return null;
234
+ const synthesized = synthesizeSpokenSegments(text);
221
235
 
222
236
  const normalized = text.toLowerCase();
223
237
  const upbeat =
@@ -227,47 +241,48 @@ function inferSpokenMetadataFromContent(content) {
227
241
  /\b(sorry|gentle|softly|careful|reassuring|calm|okay|it'?s okay|i know)\b/.test(normalized);
228
242
  const curious = /\?/.test(text);
229
243
 
230
- if (upbeat) {
231
- return {
232
- text,
233
- language: synthesized?.language || 'English',
234
- segments: synthesized?.segments,
235
- instructions: 'Speak with warm, upbeat conversational energy and natural pacing.',
236
- style: { emotion: 'upbeat', energy: 'medium' },
237
- };
238
- }
239
-
240
- if (gentle) {
241
- return {
242
- text,
243
- language: synthesized?.language || 'English',
244
- segments: synthesized?.segments,
245
- instructions: 'Speak gently and reassuringly, with a calm pace and soft emphasis.',
246
- style: { emotion: 'gentle', energy: 'low' },
247
- };
248
- }
249
-
250
- if (curious) {
251
- return {
252
- text,
253
- language: synthesized?.language || 'English',
254
- segments: synthesized?.segments,
255
- instructions: 'Speak naturally with curious, engaged intonation and a conversational pace.',
256
- style: { emotion: 'curious', energy: 'medium' },
257
- };
258
- }
259
-
260
- return {
261
- text,
262
- language: synthesized?.language || 'English',
263
- segments: synthesized?.segments,
264
- instructions: 'Speak naturally with light warmth and conversational pacing.',
265
- style: { emotion: 'neutral', energy: 'medium' },
266
- };
267
- }
268
-
269
- export {
270
- inferSpokenMetadataFromContent,
271
- normalizeSpokenMetadata,
272
- normalizeSpeechText,
273
- };
244
+ if (upbeat) {
245
+ return normalizeSpokenMetadata({
246
+ text,
247
+ language: synthesized?.language || 'English',
248
+ segments: synthesized?.segments,
249
+ instructions: 'Speak with warm, upbeat conversational energy and natural pacing.',
250
+ style: { emotion: 'upbeat', energy: 'medium' },
251
+ });
252
+ }
253
+
254
+ if (gentle) {
255
+ return normalizeSpokenMetadata({
256
+ text,
257
+ language: synthesized?.language || 'English',
258
+ segments: synthesized?.segments,
259
+ instructions: 'Speak gently and reassuringly, with a calm pace and soft emphasis.',
260
+ style: { emotion: 'gentle', energy: 'low' },
261
+ });
262
+ }
263
+
264
+ if (curious) {
265
+ return normalizeSpokenMetadata({
266
+ text,
267
+ language: synthesized?.language || 'English',
268
+ segments: synthesized?.segments,
269
+ instructions: 'Speak naturally with curious, engaged intonation and a conversational pace.',
270
+ style: { emotion: 'curious', energy: 'medium' },
271
+ });
272
+ }
273
+
274
+ return normalizeSpokenMetadata({
275
+ text,
276
+ language: synthesized?.language || 'English',
277
+ segments: synthesized?.segments,
278
+ instructions: 'Speak naturally with light warmth and conversational pacing.',
279
+ style: { emotion: 'neutral', energy: 'medium' },
280
+ });
281
+ }
282
+
283
+ export {
284
+ inferSpokenMetadataFromContent,
285
+ normalizeSpokenMetadata,
286
+ normalizeSpeechText,
287
+ stripAvatarCommandTags,
288
+ };
@@ -2,7 +2,7 @@
2
2
  "id": "oomi-ai",
3
3
  "name": "Oomi Channel Plugin",
4
4
  "description": "Managed Oomi channel integration for OpenClaw.",
5
- "version": "0.2.24",
5
+ "version": "0.2.27",
6
6
  "author": "Oomi",
7
7
  "license": "MIT",
8
8
  "openclawVersion": ">=0.5.0",
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "oomi-ai",
3
- "version": "0.2.24",
3
+ "version": "0.2.27",
4
4
  "description": "Oomi OpenClaw channel plugin and bridge tooling",
5
5
  "bin": {
6
6
  "oomi": "bin/oomi-ai.js"