oomi-ai 0.2.25 → 0.2.27
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +95 -27
- package/bin/oomi-ai.js +503 -43
- package/lib/spokenMetadata.js +49 -51
- package/openclaw.plugin.json +1 -1
- package/package.json +1 -1
package/README.md
CHANGED
|
@@ -4,12 +4,13 @@ OpenClaw channel plugin and bridge tooling for Oomi managed chat and voice.
|
|
|
4
4
|
|
|
5
5
|
## Current Focus
|
|
6
6
|
|
|
7
|
-
`0.2.
|
|
7
|
+
`0.2.27` keeps the persona automation lane and adds a usable local managed-voice validation path:
|
|
8
8
|
- WebSpatial-based persona scaffolding for generated Oomi apps
|
|
9
9
|
- a high-level `oomi personas create-managed` command for agent-driven persona creation
|
|
10
10
|
- device-authenticated persona runtime registration and job callbacks
|
|
11
11
|
- automatic bridge-side polling for queued `persona_job` control messages
|
|
12
|
-
-
|
|
12
|
+
- one shared spoken-metadata normalizer used by both the extension and the bridge
|
|
13
|
+
- a repo-backed local `tts-pipeline` replay that can validate assistant-final -> backend -> real Qwen TTS before publishing
|
|
13
14
|
|
|
14
15
|
This package is for two audiences:
|
|
15
16
|
- OpenClaw operators who need to connect a machine to Oomi and keep chat or voice healthy
|
|
@@ -148,6 +149,46 @@ For managed cloned-voice replies, the canonical contract is:
|
|
|
148
149
|
|
|
149
150
|
The backend cloned-voice path is intentionally strict. If `metadata.spoken` does not reach Oomi, backend TTS fails instead of speaking a flat fallback voice.
|
|
150
151
|
|
|
152
|
+
## Local TTS Validation
|
|
153
|
+
|
|
154
|
+
If you are developing this package inside the Oomi repo, you can now validate the managed voice path locally before publishing.
|
|
155
|
+
|
|
156
|
+
This local gate does three things:
|
|
157
|
+
- replays an assistant `chat.final` frame through the same spoken-metadata normalization path used by the OpenClaw extension and the bridge
|
|
158
|
+
- feeds that normalized frame into the Rails backend replay harness
|
|
159
|
+
- optionally calls the real Qwen cloned-voice provider and confirms that audio deltas come back
|
|
160
|
+
|
|
161
|
+
Important:
|
|
162
|
+
- this is a repo developer workflow, not a generic npm-only operator command
|
|
163
|
+
- it expects the Oomi repo checkout, the Rails backend, and local provider env vars
|
|
164
|
+
- the real-provider replay can auto-enroll a disposable default sample voice profile from `assets/voice/source/nemu-enrollment-sample.mp3`
|
|
165
|
+
|
|
166
|
+
Assistant-final contract only:
|
|
167
|
+
|
|
168
|
+
```bash
|
|
169
|
+
oomi openclaw debug assistant-final --text "Hey Justin! How is the testing going?" --json
|
|
170
|
+
```
|
|
171
|
+
|
|
172
|
+
Full local backend replay:
|
|
173
|
+
|
|
174
|
+
```bash
|
|
175
|
+
oomi openclaw debug tts-pipeline --text "When your voice reaches me, it gets turned into text, I read it and think about it, then I speak back through the managed chat session." --json
|
|
176
|
+
```
|
|
177
|
+
|
|
178
|
+
Real Qwen provider replay:
|
|
179
|
+
|
|
180
|
+
```bash
|
|
181
|
+
oomi openclaw debug tts-pipeline --text "When your voice reaches me, it gets turned into text, I read it and think about it, then I speak back through the managed chat session." --live-provider --env-file .env.local --provider-timeout-ms 20000 --json
|
|
182
|
+
```
|
|
183
|
+
|
|
184
|
+
What a good result looks like:
|
|
185
|
+
- `backend.success = true`
|
|
186
|
+
- `managed.assistantSpeechFinal.present = true`
|
|
187
|
+
- `qwen.errorCode = null`
|
|
188
|
+
- `qwen.audioDeltaCount > 0` when `--live-provider` is used
|
|
189
|
+
|
|
190
|
+
This is the preferred pre-publish gate for managed voice regressions, because it is much faster than publishing to npm and testing through a live OpenClaw machine first.
|
|
191
|
+
|
|
151
192
|
## Persona Scaffolding
|
|
152
193
|
|
|
153
194
|
Use the scaffold flow when OpenClaw needs to build a managed persona app that will live inside Oomi:
|
|
@@ -192,7 +233,7 @@ The bridge status file is written locally and should roughly be interpreted as:
|
|
|
192
233
|
|
|
193
234
|
For voice support, a `voice_session_*` failure should be treated as narrower than a full provider outage.
|
|
194
235
|
|
|
195
|
-
## Troubleshooting
|
|
236
|
+
## Troubleshooting
|
|
196
237
|
|
|
197
238
|
### `invalid handshake: first request must be connect`
|
|
198
239
|
|
|
@@ -223,19 +264,32 @@ What to check:
|
|
|
223
264
|
|
|
224
265
|
If the process is alive but runtime faults are being caught, expect `degraded` rather than an immediate hard stop.
|
|
225
266
|
|
|
226
|
-
### Voice STT works but the agent does not answer
|
|
267
|
+
### Voice STT works but the agent does not answer
|
|
227
268
|
|
|
228
269
|
This usually means one of these:
|
|
229
270
|
- the managed gateway/device side is not actually ready
|
|
230
271
|
- the bridge or agent run failed after delivery
|
|
231
272
|
- the OpenClaw run stopped with an upstream provider `network_error`
|
|
232
273
|
|
|
233
|
-
In that situation, inspect:
|
|
234
|
-
- `~/.openclaw/logs/gateway.log`
|
|
235
|
-
- `~/.openclaw/logs/gateway.err.log`
|
|
236
|
-
- the relevant session JSONL in `~/.openclaw/agents/main/sessions/`
|
|
274
|
+
In that situation, inspect:
|
|
275
|
+
- `~/.openclaw/logs/gateway.log`
|
|
276
|
+
- `~/.openclaw/logs/gateway.err.log`
|
|
277
|
+
- the relevant session JSONL in `~/.openclaw/agents/main/sessions/`
|
|
278
|
+
|
|
279
|
+
### Voice text works but cloned TTS fails with `MISSING_SPOKEN_METADATA`
|
|
280
|
+
|
|
281
|
+
Meaning:
|
|
282
|
+
- the assistant text arrived
|
|
283
|
+
- the backend voice relay never received valid hidden `metadata.spoken`
|
|
284
|
+
|
|
285
|
+
What to check:
|
|
286
|
+
- run the local replay gate before publishing:
|
|
287
|
+
- `oomi openclaw debug assistant-final --text "..."`
|
|
288
|
+
- `oomi openclaw debug tts-pipeline --text "..."`
|
|
289
|
+
- if the package local replay succeeds but the live machine fails, verify the OpenClaw machine is actually running the updated bridge binary
|
|
290
|
+
- if the local replay fails, fix the assistant-final contract first instead of debugging the browser or backend deployment
|
|
237
291
|
|
|
238
|
-
## Developer Notes
|
|
292
|
+
## Developer Notes
|
|
239
293
|
|
|
240
294
|
If you are inspecting this package on npm, the main architectural points are:
|
|
241
295
|
- the extension path is the stable managed text contract
|
|
@@ -248,24 +302,38 @@ If you are inspecting this package on npm, the main architectural points are:
|
|
|
248
302
|
- runtime fault isolation so local session failures are less likely to crash the whole provider
|
|
249
303
|
- one shared hidden managed-voice speech metadata helper used by both the extension and the local bridge
|
|
250
304
|
|
|
251
|
-
If you are developing the plugin, test the packaged surface with:
|
|
252
|
-
|
|
253
|
-
```bash
|
|
254
|
-
cd packages/oomi-ai
|
|
255
|
-
node --test test/*.test.mjs
|
|
256
|
-
npm pack --dry-run
|
|
257
|
-
```
|
|
258
|
-
|
|
259
|
-
|
|
260
|
-
|
|
261
|
-
|
|
262
|
-
|
|
263
|
-
|
|
264
|
-
|
|
265
|
-
|
|
266
|
-
|
|
267
|
-
|
|
268
|
-
|
|
305
|
+
If you are developing the plugin, test the packaged surface with:
|
|
306
|
+
|
|
307
|
+
```bash
|
|
308
|
+
cd packages/oomi-ai
|
|
309
|
+
node --test test/*.test.mjs
|
|
310
|
+
npm pack --dry-run
|
|
311
|
+
```
|
|
312
|
+
|
|
313
|
+
For managed voice changes, do not stop at the package tests. Run the local replay gate from the repo root as well, especially before publishing:
|
|
314
|
+
|
|
315
|
+
```bash
|
|
316
|
+
oomi openclaw debug tts-pipeline --text "Local managed voice validation text." --json
|
|
317
|
+
oomi openclaw debug tts-pipeline --text "Local managed voice validation text." --live-provider --env-file .env.local --provider-timeout-ms 20000 --json
|
|
318
|
+
```
|
|
319
|
+
|
|
320
|
+
## Release Process
|
|
321
|
+
|
|
322
|
+
Before publishing:
|
|
323
|
+
|
|
324
|
+
```bash
|
|
325
|
+
cd packages/oomi-ai
|
|
326
|
+
node --test test/*.test.mjs
|
|
327
|
+
npm pack --dry-run
|
|
328
|
+
```
|
|
329
|
+
|
|
330
|
+
For voice-related changes, also run the repo-backed local replay gate before publish:
|
|
331
|
+
|
|
332
|
+
```bash
|
|
333
|
+
oomi openclaw debug tts-pipeline --text "Local managed voice validation text." --json
|
|
334
|
+
oomi openclaw debug tts-pipeline --text "Local managed voice validation text." --live-provider --env-file .env.local --provider-timeout-ms 20000 --json
|
|
335
|
+
```
|
|
336
|
+
|
|
269
337
|
Then publish the bumped version:
|
|
270
338
|
|
|
271
339
|
```bash
|
package/bin/oomi-ai.js
CHANGED
|
@@ -39,13 +39,21 @@ const BRIDGE_CONNECT_CHALLENGE_TIMEOUT_MS = parsePositiveInteger(
|
|
|
39
39
|
process.env.OOMI_BRIDGE_CONNECT_CHALLENGE_TIMEOUT_MS,
|
|
40
40
|
3000
|
|
41
41
|
);
|
|
42
|
-
const BRIDGE_GATEWAY_REQUEST_TIMEOUT_MS = parsePositiveInteger(
|
|
43
|
-
process.env.OOMI_BRIDGE_GATEWAY_REQUEST_TIMEOUT_MS,
|
|
44
|
-
30000
|
|
45
|
-
);
|
|
46
|
-
const BRIDGE_LAUNCHD_LABEL = 'ai.oomi.bridge';
|
|
47
|
-
const
|
|
48
|
-
|
|
42
|
+
const BRIDGE_GATEWAY_REQUEST_TIMEOUT_MS = parsePositiveInteger(
|
|
43
|
+
process.env.OOMI_BRIDGE_GATEWAY_REQUEST_TIMEOUT_MS,
|
|
44
|
+
30000
|
|
45
|
+
);
|
|
46
|
+
const BRIDGE_LAUNCHD_LABEL = 'ai.oomi.bridge';
|
|
47
|
+
const DEBUG_PROVIDER_ENV_KEYS = [
|
|
48
|
+
'QWEN_REALTIME_API_KEY',
|
|
49
|
+
'QWEN_REALTIME_BASE_URL',
|
|
50
|
+
'QWEN_REALTIME_ASR_MODEL',
|
|
51
|
+
'QWEN_REALTIME_TTS_MODEL',
|
|
52
|
+
'QWEN_REALTIME_TTS_VOICE',
|
|
53
|
+
'QWEN_REALTIME_LANGUAGE',
|
|
54
|
+
];
|
|
55
|
+
const DEVICE_IDENTITY_PATH = path.join(os.homedir(), '.openclaw', 'identity', 'device.json');
|
|
56
|
+
const ED25519_SPKI_PREFIX = Buffer.from('302a300506032b6570032100', 'hex');
|
|
49
57
|
|
|
50
58
|
function parsePositiveInteger(value, fallback) {
|
|
51
59
|
const num = Number(value);
|
|
@@ -169,10 +177,14 @@ Commands:
|
|
|
169
177
|
openclaw install
|
|
170
178
|
Install agent instructions and the Oomi skill into OpenClaw.
|
|
171
179
|
|
|
172
|
-
openclaw bridge [start|ensure|stop|restart|ps]
|
|
173
|
-
Manage local OpenClaw-to-Oomi bridge lifecycle (singleton).
|
|
174
|
-
openclaw bridge service [install|start|stop|restart|status|uninstall]
|
|
175
|
-
Manage macOS launchd bridge supervision.
|
|
180
|
+
openclaw bridge [start|ensure|stop|restart|ps]
|
|
181
|
+
Manage local OpenClaw-to-Oomi bridge lifecycle (singleton).
|
|
182
|
+
openclaw bridge service [install|start|stop|restart|status|uninstall]
|
|
183
|
+
Manage macOS launchd bridge supervision.
|
|
184
|
+
openclaw debug assistant-final
|
|
185
|
+
Replay an assistant chat.final frame through spoken-metadata normalization.
|
|
186
|
+
openclaw debug tts-pipeline
|
|
187
|
+
Replay an assistant chat.final through local backend voice handling.
|
|
176
188
|
|
|
177
189
|
openclaw pair
|
|
178
190
|
Pair this OpenClaw host with Oomi and start bridge (single command).
|
|
@@ -225,9 +237,20 @@ Common flags:
|
|
|
225
237
|
--device-id ID Bridge device identifier (default: host name)
|
|
226
238
|
--device-token TOKEN Existing bridge device token
|
|
227
239
|
--show-secrets Print full token values in diagnostic output
|
|
228
|
-
--json Print pairing result as JSON (for automation)
|
|
229
|
-
--
|
|
230
|
-
--
|
|
240
|
+
--json Print pairing result as JSON (for automation)
|
|
241
|
+
--text TEXT Assistant text for local debug frame replay
|
|
242
|
+
--frame-file PATH Read a raw gateway frame from disk for local debug replay
|
|
243
|
+
--frame-json JSON Use raw gateway frame JSON text for local debug replay
|
|
244
|
+
--session-id ID Debug session id override (default: ms_debug_local)
|
|
245
|
+
--user-text TEXT User utterance text used for backend voice replay
|
|
246
|
+
--live-provider Use the real Qwen TTS provider in local debug replay
|
|
247
|
+
--env-file PATH Load provider env vars from a specific env file (default: <repo>/.env.local)
|
|
248
|
+
--provider-timeout-ms N
|
|
249
|
+
Timeout in ms for live provider audio during local debug replay
|
|
250
|
+
--backend-url URL Override Oomi backend URL
|
|
251
|
+
--root PATH Override repo root path for persona discovery
|
|
252
|
+
--role ROLE Message role override for local debug frame replay
|
|
253
|
+
--omit-role Omit message.role in the generated local debug frame
|
|
231
254
|
--name NAME Persona display name (for create)
|
|
232
255
|
--description TEXT Persona description (for scaffold)
|
|
233
256
|
--slug SLUG Explicit slug override (for create-managed)
|
|
@@ -261,13 +284,43 @@ function readFile(filePath) {
|
|
|
261
284
|
return fs.readFileSync(filePath, 'utf-8');
|
|
262
285
|
}
|
|
263
286
|
|
|
264
|
-
function writeFile(filePath, content, options = undefined) {
|
|
265
|
-
fs.writeFileSync(filePath, content, options);
|
|
266
|
-
}
|
|
267
|
-
|
|
268
|
-
function
|
|
269
|
-
|
|
270
|
-
|
|
287
|
+
function writeFile(filePath, content, options = undefined) {
|
|
288
|
+
fs.writeFileSync(filePath, content, options);
|
|
289
|
+
}
|
|
290
|
+
|
|
291
|
+
function parseDotEnvLine(line) {
|
|
292
|
+
const trimmed = String(line || '').trim();
|
|
293
|
+
if (!trimmed || trimmed.startsWith('#')) return null;
|
|
294
|
+
const separatorIndex = trimmed.indexOf('=');
|
|
295
|
+
if (separatorIndex <= 0) return null;
|
|
296
|
+
const key = trimmed.slice(0, separatorIndex).trim();
|
|
297
|
+
if (!key) return null;
|
|
298
|
+
let value = trimmed.slice(separatorIndex + 1).trim();
|
|
299
|
+
if ((value.startsWith('"') && value.endsWith('"')) || (value.startsWith("'") && value.endsWith("'"))) {
|
|
300
|
+
value = value.slice(1, -1);
|
|
301
|
+
}
|
|
302
|
+
return { key, value };
|
|
303
|
+
}
|
|
304
|
+
|
|
305
|
+
function loadEnvFile(filePath, keys = []) {
|
|
306
|
+
if (!filePath || !fs.existsSync(filePath)) {
|
|
307
|
+
throw new Error(`Environment file not found: ${filePath}`);
|
|
308
|
+
}
|
|
309
|
+
const selectedKeys = Array.isArray(keys) && keys.length ? new Set(keys) : null;
|
|
310
|
+
const entries = {};
|
|
311
|
+
const lines = readFile(filePath).split(/\r?\n/);
|
|
312
|
+
for (const line of lines) {
|
|
313
|
+
const parsed = parseDotEnvLine(line);
|
|
314
|
+
if (!parsed) continue;
|
|
315
|
+
if (selectedKeys && !selectedKeys.has(parsed.key)) continue;
|
|
316
|
+
entries[parsed.key] = parsed.value;
|
|
317
|
+
}
|
|
318
|
+
return entries;
|
|
319
|
+
}
|
|
320
|
+
|
|
321
|
+
function xmlEscape(value) {
|
|
322
|
+
return String(value)
|
|
323
|
+
.replaceAll('&', '&')
|
|
271
324
|
.replaceAll('<', '<')
|
|
272
325
|
.replaceAll('>', '>')
|
|
273
326
|
.replaceAll('"', '"')
|
|
@@ -356,9 +409,9 @@ function ensureDir(dirPath) {
|
|
|
356
409
|
}
|
|
357
410
|
}
|
|
358
411
|
|
|
359
|
-
function findRepoRoot(startDir) {
|
|
360
|
-
let current = startDir;
|
|
361
|
-
for (let i = 0; i < 6; i += 1) {
|
|
412
|
+
function findRepoRoot(startDir) {
|
|
413
|
+
let current = startDir;
|
|
414
|
+
for (let i = 0; i < 6; i += 1) {
|
|
362
415
|
const personasDir = path.join(current, 'personas');
|
|
363
416
|
const skillsDir = path.join(current, 'skills', 'oomi');
|
|
364
417
|
if (fs.existsSync(personasDir) || fs.existsSync(skillsDir)) {
|
|
@@ -367,11 +420,23 @@ function findRepoRoot(startDir) {
|
|
|
367
420
|
const parent = path.dirname(current);
|
|
368
421
|
if (parent === current) break;
|
|
369
422
|
current = parent;
|
|
370
|
-
}
|
|
371
|
-
return null;
|
|
372
|
-
}
|
|
373
|
-
|
|
374
|
-
function
|
|
423
|
+
}
|
|
424
|
+
return null;
|
|
425
|
+
}
|
|
426
|
+
|
|
427
|
+
function resolveRepoRoot(rootFlag) {
|
|
428
|
+
const explicitRoot =
|
|
429
|
+
typeof rootFlag === 'string' && rootFlag.trim()
|
|
430
|
+
? path.resolve(rootFlag.trim())
|
|
431
|
+
: '';
|
|
432
|
+
const repoRoot = explicitRoot || findRepoRoot(process.cwd()) || findRepoRoot(PACKAGE_ROOT);
|
|
433
|
+
if (!repoRoot) {
|
|
434
|
+
throw new Error('Could not locate repo root. Use --root <repo root>.');
|
|
435
|
+
}
|
|
436
|
+
return repoRoot;
|
|
437
|
+
}
|
|
438
|
+
|
|
439
|
+
function resolveSkillSource(cliRoot) {
|
|
375
440
|
const packaged = path.join(PACKAGE_ROOT, 'skills', 'oomi');
|
|
376
441
|
if (fs.existsSync(packaged)) {
|
|
377
442
|
return packaged;
|
|
@@ -1698,7 +1763,7 @@ function summarizeVoiceFrameContract(frameText) {
|
|
|
1698
1763
|
};
|
|
1699
1764
|
}
|
|
1700
1765
|
|
|
1701
|
-
function
|
|
1766
|
+
function ensureAssistantSpokenMetadata(frameText) {
|
|
1702
1767
|
const frame = parseJsonPayload(frameText);
|
|
1703
1768
|
if (!frame || typeof frame !== 'object') {
|
|
1704
1769
|
return { frameText, changed: false, reason: '' };
|
|
@@ -1753,6 +1818,395 @@ function ensureVoiceAssistantSpokenMetadata(frameText) {
|
|
|
1753
1818
|
reason: normalizedExplicitSpoken ? 'normalized' : (messageRole ? 'synthesized' : 'synthesized_missing_role'),
|
|
1754
1819
|
};
|
|
1755
1820
|
}
|
|
1821
|
+
|
|
1822
|
+
function normalizeAssistantGatewayFrame(sessionId, frameText) {
|
|
1823
|
+
const scope = classifyBridgeSessionScope(sessionId);
|
|
1824
|
+
const summary = summarizeVoiceFrameContract(frameText);
|
|
1825
|
+
if (!summary.parseable || summary.event !== 'chat' || summary.state !== 'final') {
|
|
1826
|
+
return {
|
|
1827
|
+
frameText,
|
|
1828
|
+
changed: false,
|
|
1829
|
+
reason: '',
|
|
1830
|
+
scope,
|
|
1831
|
+
summary,
|
|
1832
|
+
};
|
|
1833
|
+
}
|
|
1834
|
+
|
|
1835
|
+
const normalized = ensureAssistantSpokenMetadata(frameText);
|
|
1836
|
+
return {
|
|
1837
|
+
...normalized,
|
|
1838
|
+
scope,
|
|
1839
|
+
summary,
|
|
1840
|
+
};
|
|
1841
|
+
}
|
|
1842
|
+
|
|
1843
|
+
function buildAssistantFinalDebugFrame({ sessionKey, text, role }) {
|
|
1844
|
+
const trimmedSessionKey =
|
|
1845
|
+
typeof sessionKey === 'string' && sessionKey.trim()
|
|
1846
|
+
? sessionKey.trim()
|
|
1847
|
+
: 'agent:main:webchat:channel:oomi';
|
|
1848
|
+
const message = {
|
|
1849
|
+
content: String(text || ''),
|
|
1850
|
+
};
|
|
1851
|
+
if (typeof role === 'string' && role.trim()) {
|
|
1852
|
+
message.role = role.trim();
|
|
1853
|
+
}
|
|
1854
|
+
return JSON.stringify({
|
|
1855
|
+
type: 'event',
|
|
1856
|
+
event: 'chat',
|
|
1857
|
+
payload: {
|
|
1858
|
+
sessionKey: trimmedSessionKey,
|
|
1859
|
+
state: 'final',
|
|
1860
|
+
message,
|
|
1861
|
+
},
|
|
1862
|
+
});
|
|
1863
|
+
}
|
|
1864
|
+
|
|
1865
|
+
function extractSpokenMetadata(frameText) {
|
|
1866
|
+
const payload = parseJsonPayload(frameText);
|
|
1867
|
+
const message =
|
|
1868
|
+
payload &&
|
|
1869
|
+
payload.payload &&
|
|
1870
|
+
typeof payload.payload === 'object' &&
|
|
1871
|
+
payload.payload.message &&
|
|
1872
|
+
typeof payload.payload.message === 'object'
|
|
1873
|
+
? payload.payload.message
|
|
1874
|
+
: null;
|
|
1875
|
+
const metadata =
|
|
1876
|
+
message &&
|
|
1877
|
+
message.metadata &&
|
|
1878
|
+
typeof message.metadata === 'object' &&
|
|
1879
|
+
!Array.isArray(message.metadata)
|
|
1880
|
+
? message.metadata
|
|
1881
|
+
: {};
|
|
1882
|
+
return normalizeSpokenMetadata(metadata.spoken);
|
|
1883
|
+
}
|
|
1884
|
+
|
|
1885
|
+
function runAssistantFinalDebugCheck(options = {}) {
|
|
1886
|
+
const sessionId =
|
|
1887
|
+
typeof options.sessionId === 'string' && options.sessionId.trim()
|
|
1888
|
+
? options.sessionId.trim()
|
|
1889
|
+
: 'ms_debug_local';
|
|
1890
|
+
const sessionKey =
|
|
1891
|
+
typeof options.sessionKey === 'string' && options.sessionKey.trim()
|
|
1892
|
+
? options.sessionKey.trim()
|
|
1893
|
+
: 'agent:main:webchat:channel:oomi';
|
|
1894
|
+
const role =
|
|
1895
|
+
options.omitRole
|
|
1896
|
+
? ''
|
|
1897
|
+
: (typeof options.role === 'string' && options.role.trim() ? options.role.trim() : 'assistant');
|
|
1898
|
+
|
|
1899
|
+
const rawFrameText =
|
|
1900
|
+
typeof options.frameText === 'string' && options.frameText.trim()
|
|
1901
|
+
? options.frameText
|
|
1902
|
+
: buildAssistantFinalDebugFrame({
|
|
1903
|
+
sessionKey,
|
|
1904
|
+
text: options.text,
|
|
1905
|
+
role,
|
|
1906
|
+
});
|
|
1907
|
+
|
|
1908
|
+
const before = summarizeVoiceFrameContract(rawFrameText);
|
|
1909
|
+
const normalized = normalizeAssistantGatewayFrame(sessionId, rawFrameText);
|
|
1910
|
+
const after = summarizeVoiceFrameContract(normalized.frameText);
|
|
1911
|
+
const spoken = extractSpokenMetadata(normalized.frameText);
|
|
1912
|
+
|
|
1913
|
+
return {
|
|
1914
|
+
sessionId,
|
|
1915
|
+
sessionKey,
|
|
1916
|
+
scope: normalized.scope,
|
|
1917
|
+
changed: normalized.changed,
|
|
1918
|
+
reason: normalized.reason,
|
|
1919
|
+
before,
|
|
1920
|
+
after,
|
|
1921
|
+
spoken,
|
|
1922
|
+
frameText: normalized.frameText,
|
|
1923
|
+
};
|
|
1924
|
+
}
|
|
1925
|
+
|
|
1926
|
+
function printAssistantFinalDebugResult(result, asJson) {
|
|
1927
|
+
if (asJson) {
|
|
1928
|
+
console.log(JSON.stringify(result, null, 2));
|
|
1929
|
+
return;
|
|
1930
|
+
}
|
|
1931
|
+
|
|
1932
|
+
console.log(`Session id: ${result.sessionId}`);
|
|
1933
|
+
console.log(`Session key: ${result.sessionKey}`);
|
|
1934
|
+
console.log(`Scope: ${result.scope}`);
|
|
1935
|
+
console.log(`Changed: ${result.changed ? 'yes' : 'no'}${result.reason ? ` (${result.reason})` : ''}`);
|
|
1936
|
+
console.log(
|
|
1937
|
+
`Before: event=${result.before.event || '<none>'} state=${result.before.state || '<none>'} role=${result.before.role || '<none>'} spoken=${result.before.spokenNormalized ? 'yes' : 'no'}`
|
|
1938
|
+
);
|
|
1939
|
+
console.log(
|
|
1940
|
+
`After: event=${result.after.event || '<none>'} state=${result.after.state || '<none>'} role=${result.after.role || '<none>'} spoken=${result.after.spokenNormalized ? 'yes' : 'no'}`
|
|
1941
|
+
);
|
|
1942
|
+
if (result.spoken) {
|
|
1943
|
+
console.log(`Spoken text: ${result.spoken.text}`);
|
|
1944
|
+
console.log(`Segments: ${Array.isArray(result.spoken.segments) ? result.spoken.segments.length : 0}`);
|
|
1945
|
+
if (typeof result.spoken.instructions === 'string' && result.spoken.instructions.trim()) {
|
|
1946
|
+
console.log(`Instructions: ${result.spoken.instructions}`);
|
|
1947
|
+
}
|
|
1948
|
+
} else {
|
|
1949
|
+
console.log('Spoken text: <missing>');
|
|
1950
|
+
}
|
|
1951
|
+
}
|
|
1952
|
+
|
|
1953
|
+
function resolveCommandFromPath(commandName) {
|
|
1954
|
+
const normalized = String(commandName || '').trim();
|
|
1955
|
+
if (!normalized) return '';
|
|
1956
|
+
try {
|
|
1957
|
+
const probe = spawnSync(process.platform === 'win32' ? 'where' : 'which', [normalized], {
|
|
1958
|
+
encoding: 'utf8',
|
|
1959
|
+
stdio: ['ignore', 'pipe', 'ignore'],
|
|
1960
|
+
});
|
|
1961
|
+
if (probe.status !== 0) return '';
|
|
1962
|
+
const firstLine = String(probe.stdout || '')
|
|
1963
|
+
.split(/\r?\n/)
|
|
1964
|
+
.map((line) => line.trim())
|
|
1965
|
+
.find(Boolean);
|
|
1966
|
+
return firstLine || '';
|
|
1967
|
+
} catch {
|
|
1968
|
+
return '';
|
|
1969
|
+
}
|
|
1970
|
+
}
|
|
1971
|
+
|
|
1972
|
+
function resolveExecutable(candidates = []) {
|
|
1973
|
+
for (const candidate of candidates) {
|
|
1974
|
+
if (!candidate) continue;
|
|
1975
|
+
const value = String(candidate).trim();
|
|
1976
|
+
if (!value) continue;
|
|
1977
|
+
if (path.isAbsolute(value) && fs.existsSync(value)) {
|
|
1978
|
+
return value;
|
|
1979
|
+
}
|
|
1980
|
+
if (value.includes(path.sep) || value.includes('/')) {
|
|
1981
|
+
const resolved = path.resolve(value);
|
|
1982
|
+
if (fs.existsSync(resolved)) {
|
|
1983
|
+
return resolved;
|
|
1984
|
+
}
|
|
1985
|
+
continue;
|
|
1986
|
+
}
|
|
1987
|
+
const fromPath = resolveCommandFromPath(value);
|
|
1988
|
+
if (fromPath) {
|
|
1989
|
+
return fromPath;
|
|
1990
|
+
}
|
|
1991
|
+
}
|
|
1992
|
+
return '';
|
|
1993
|
+
}
|
|
1994
|
+
|
|
1995
|
+
function resolveBackendRoot(rootFlag) {
|
|
1996
|
+
const repoRoot = resolveRepoRoot(rootFlag);
|
|
1997
|
+
const backendRoot = path.join(repoRoot, 'apps', 'backend');
|
|
1998
|
+
if (!fs.existsSync(backendRoot)) {
|
|
1999
|
+
throw new Error(`Could not locate backend app at ${backendRoot}`);
|
|
2000
|
+
}
|
|
2001
|
+
return backendRoot;
|
|
2002
|
+
}
|
|
2003
|
+
|
|
2004
|
+
function resolveRubyExecutable() {
|
|
2005
|
+
const candidates = [
|
|
2006
|
+
process.env.OOMI_RUBY_BIN,
|
|
2007
|
+
process.env.RUBY,
|
|
2008
|
+
process.platform === 'win32' ? 'ruby.exe' : 'ruby',
|
|
2009
|
+
process.platform === 'win32' ? 'ruby' : '',
|
|
2010
|
+
process.platform === 'win32' ? 'C:\\Ruby33-x64\\bin\\ruby.exe' : '',
|
|
2011
|
+
];
|
|
2012
|
+
const executable = resolveExecutable(candidates);
|
|
2013
|
+
if (!executable) {
|
|
2014
|
+
throw new Error('Ruby executable not found. Set OOMI_RUBY_BIN or install Ruby locally.');
|
|
2015
|
+
}
|
|
2016
|
+
return executable;
|
|
2017
|
+
}
|
|
2018
|
+
|
|
2019
|
+
function resolveBundleExecutable() {
|
|
2020
|
+
const candidates = [
|
|
2021
|
+
process.env.OOMI_BUNDLE_BIN,
|
|
2022
|
+
process.platform === 'win32' ? 'bundle.bat' : 'bundle',
|
|
2023
|
+
'bundle',
|
|
2024
|
+
process.platform === 'win32' ? 'C:\\Ruby33-x64\\bin\\bundle.bat' : '',
|
|
2025
|
+
];
|
|
2026
|
+
const executable = resolveExecutable(candidates);
|
|
2027
|
+
if (!executable) {
|
|
2028
|
+
throw new Error('Bundler executable not found. Set OOMI_BUNDLE_BIN or install Bundler locally.');
|
|
2029
|
+
}
|
|
2030
|
+
return executable;
|
|
2031
|
+
}
|
|
2032
|
+
|
|
2033
|
+
function shellQuote(value) {
|
|
2034
|
+
const text = String(value);
|
|
2035
|
+
if (process.platform === 'win32') {
|
|
2036
|
+
return `"${text.replace(/"/g, '""')}"`;
|
|
2037
|
+
}
|
|
2038
|
+
return `'${text.replace(/'/g, `'\\''`)}'`;
|
|
2039
|
+
}
|
|
2040
|
+
|
|
2041
|
+
async function runBundledRubyScript({ backendRoot, scriptPath, inputFile, env = undefined }) {
|
|
2042
|
+
const rubyExecutable = resolveRubyExecutable();
|
|
2043
|
+
const bundleExecutable = resolveBundleExecutable();
|
|
2044
|
+
const commandText = process.platform === 'win32'
|
|
2045
|
+
? [bundleExecutable, 'exec', rubyExecutable, scriptPath, '--input-file', inputFile].map(shellQuote).join(' ')
|
|
2046
|
+
: '';
|
|
2047
|
+
const childEnv = env ? { ...process.env, ...env } : process.env;
|
|
2048
|
+
|
|
2049
|
+
return await new Promise((resolve, reject) => {
|
|
2050
|
+
const child = process.platform === 'win32'
|
|
2051
|
+
? spawn(commandText, [], {
|
|
2052
|
+
cwd: backendRoot,
|
|
2053
|
+
shell: true,
|
|
2054
|
+
env: childEnv,
|
|
2055
|
+
stdio: ['ignore', 'pipe', 'pipe'],
|
|
2056
|
+
})
|
|
2057
|
+
: spawn(bundleExecutable, ['exec', rubyExecutable, scriptPath, '--input-file', inputFile], {
|
|
2058
|
+
cwd: backendRoot,
|
|
2059
|
+
env: childEnv,
|
|
2060
|
+
stdio: ['ignore', 'pipe', 'pipe'],
|
|
2061
|
+
});
|
|
2062
|
+
|
|
2063
|
+
let stdout = '';
|
|
2064
|
+
let stderr = '';
|
|
2065
|
+
child.stdout.on('data', (chunk) => {
|
|
2066
|
+
stdout += chunk.toString();
|
|
2067
|
+
});
|
|
2068
|
+
child.stderr.on('data', (chunk) => {
|
|
2069
|
+
stderr += chunk.toString();
|
|
2070
|
+
});
|
|
2071
|
+
child.on('error', reject);
|
|
2072
|
+
child.on('close', (code) => {
|
|
2073
|
+
resolve({ code: Number(code || 0), stdout, stderr });
|
|
2074
|
+
});
|
|
2075
|
+
});
|
|
2076
|
+
}
|
|
2077
|
+
|
|
2078
|
+
async function runLocalTtsPipelineDebugCheck(options = {}) {
|
|
2079
|
+
const assistant = runAssistantFinalDebugCheck(options);
|
|
2080
|
+
const repoRoot = resolveRepoRoot(options.root);
|
|
2081
|
+
const backendRoot = resolveBackendRoot(options.root);
|
|
2082
|
+
const scriptPath = path.join(backendRoot, 'bin', 'voice_tts_replay.rb');
|
|
2083
|
+
if (!fs.existsSync(scriptPath)) {
|
|
2084
|
+
throw new Error(`Backend replay script not found: ${scriptPath}`);
|
|
2085
|
+
}
|
|
2086
|
+
|
|
2087
|
+
const inputPayload = {
|
|
2088
|
+
repoRoot,
|
|
2089
|
+
sessionId: assistant.sessionId,
|
|
2090
|
+
sessionKey: assistant.sessionKey,
|
|
2091
|
+
frameText: assistant.frameText,
|
|
2092
|
+
userText:
|
|
2093
|
+
typeof options.userText === 'string' && options.userText.trim()
|
|
2094
|
+
? options.userText.trim()
|
|
2095
|
+
: 'local debug utterance',
|
|
2096
|
+
liveProvider: Boolean(options.liveProvider),
|
|
2097
|
+
providerTimeoutMs: parsePositiveInteger(options.providerTimeoutMs, 15000),
|
|
2098
|
+
};
|
|
2099
|
+
let childEnv = undefined;
|
|
2100
|
+
let resolvedEnvFile = '';
|
|
2101
|
+
if (options.liveProvider) {
|
|
2102
|
+
resolvedEnvFile =
|
|
2103
|
+
typeof options.envFile === 'string' && options.envFile.trim()
|
|
2104
|
+
? path.resolve(options.envFile.trim())
|
|
2105
|
+
: path.join(repoRoot, '.env.local');
|
|
2106
|
+
childEnv = loadEnvFile(resolvedEnvFile, DEBUG_PROVIDER_ENV_KEYS);
|
|
2107
|
+
}
|
|
2108
|
+
const inputFile = path.join(os.tmpdir(), `oomi-voice-replay-${randomUUID()}.json`);
|
|
2109
|
+
writeFile(inputFile, JSON.stringify(inputPayload, null, 2) + '\n');
|
|
2110
|
+
|
|
2111
|
+
try {
|
|
2112
|
+
const backend = await runBundledRubyScript({ backendRoot, scriptPath, inputFile, env: childEnv });
|
|
2113
|
+
const parsed = backend.stdout.trim() ? JSON.parse(backend.stdout) : null;
|
|
2114
|
+
return {
|
|
2115
|
+
assistant,
|
|
2116
|
+
backend: parsed,
|
|
2117
|
+
backendExitCode: backend.code,
|
|
2118
|
+
backendStderr: backend.stderr.trim(),
|
|
2119
|
+
liveProvider: Boolean(options.liveProvider),
|
|
2120
|
+
envFile: resolvedEnvFile || null,
|
|
2121
|
+
};
|
|
2122
|
+
} finally {
|
|
2123
|
+
try {
|
|
2124
|
+
fs.unlinkSync(inputFile);
|
|
2125
|
+
} catch {
|
|
2126
|
+
// no-op
|
|
2127
|
+
}
|
|
2128
|
+
}
|
|
2129
|
+
}
|
|
2130
|
+
|
|
2131
|
+
function printTtsPipelineDebugResult(result, asJson) {
|
|
2132
|
+
if (asJson) {
|
|
2133
|
+
console.log(JSON.stringify(result, null, 2));
|
|
2134
|
+
return;
|
|
2135
|
+
}
|
|
2136
|
+
|
|
2137
|
+
console.log(`Assistant normalization: ${result.assistant.changed ? 'changed' : 'unchanged'}${result.assistant.reason ? ` (${result.assistant.reason})` : ''}`);
|
|
2138
|
+
console.log(`Assistant spoken segments: ${Array.isArray(result.assistant.spoken?.segments) ? result.assistant.spoken.segments.length : 0}`);
|
|
2139
|
+
if (!result.backend) {
|
|
2140
|
+
console.log('Backend replay: <no output>');
|
|
2141
|
+
return;
|
|
2142
|
+
}
|
|
2143
|
+
console.log(`Backend replay success: ${result.backend.success ? 'yes' : 'no'}`);
|
|
2144
|
+
console.log(`Managed speech sidecar: ${result.backend.managed?.assistantSpeechFinal?.present ? 'yes' : 'no'}`);
|
|
2145
|
+
console.log(`Backend final text: ${result.backend.qwen?.assistantTextFinal || '<missing>'}`);
|
|
2146
|
+
console.log(`Backend TTS appends: ${Array.isArray(result.backend.qwen?.ttsAppends) ? result.backend.qwen.ttsAppends.length : 0}`);
|
|
2147
|
+
console.log(`Backend TTS commits: ${Number(result.backend.qwen?.commitCount || 0)}`);
|
|
2148
|
+
if (result.liveProvider) {
|
|
2149
|
+
console.log(`Live provider audio deltas: ${Number(result.backend.qwen?.audioDeltaCount || 0)}`);
|
|
2150
|
+
console.log(`Live provider audio bytes (base64): ${Number(result.backend.qwen?.audioDeltaBytes || 0)}`);
|
|
2151
|
+
console.log(`Live provider timeout: ${result.backend.qwen?.providerTimedOut ? 'yes' : 'no'}`);
|
|
2152
|
+
}
|
|
2153
|
+
if (result.backend.qwen?.errorCode) {
|
|
2154
|
+
console.log(`Backend error: ${result.backend.qwen.errorCode}`);
|
|
2155
|
+
}
|
|
2156
|
+
if (result.backendStderr) {
|
|
2157
|
+
console.log(`Backend stderr: ${result.backendStderr}`);
|
|
2158
|
+
}
|
|
2159
|
+
}
|
|
2160
|
+
|
|
2161
|
+
async function handleOpenclawDebugCommand(action, flags) {
|
|
2162
|
+
const normalizedAction = String(action || '').trim().toLowerCase();
|
|
2163
|
+
const frameFile =
|
|
2164
|
+
typeof flags['frame-file'] === 'string' && flags['frame-file'].trim()
|
|
2165
|
+
? path.resolve(flags['frame-file'])
|
|
2166
|
+
: '';
|
|
2167
|
+
const frameText =
|
|
2168
|
+
frameFile
|
|
2169
|
+
? readFile(frameFile)
|
|
2170
|
+
: (typeof flags['frame-json'] === 'string' && flags['frame-json'].trim() ? flags['frame-json'] : '');
|
|
2171
|
+
const text = typeof flags.text === 'string' ? flags.text : '';
|
|
2172
|
+
|
|
2173
|
+
if (!frameText && !text.trim()) {
|
|
2174
|
+
throw new Error(
|
|
2175
|
+
'Assistant text or frame input is required. Usage: oomi openclaw debug assistant-final --text "<assistant text>"'
|
|
2176
|
+
);
|
|
2177
|
+
}
|
|
2178
|
+
|
|
2179
|
+
const debugOptions = {
|
|
2180
|
+
sessionId: flags['session-id'],
|
|
2181
|
+
sessionKey: flags['session-key'],
|
|
2182
|
+
role: flags.role,
|
|
2183
|
+
omitRole: isTruthyFlag(flags['omit-role']),
|
|
2184
|
+
text,
|
|
2185
|
+
frameText,
|
|
2186
|
+
root: flags.root,
|
|
2187
|
+
userText: flags['user-text'],
|
|
2188
|
+
liveProvider: isTruthyFlag(flags['live-provider']),
|
|
2189
|
+
envFile: flags['env-file'],
|
|
2190
|
+
providerTimeoutMs: flags['provider-timeout-ms'],
|
|
2191
|
+
};
|
|
2192
|
+
|
|
2193
|
+
if (normalizedAction === 'assistant-final') {
|
|
2194
|
+
const result = runAssistantFinalDebugCheck(debugOptions);
|
|
2195
|
+
printAssistantFinalDebugResult(result, isTruthyFlag(flags.json));
|
|
2196
|
+
return;
|
|
2197
|
+
}
|
|
2198
|
+
|
|
2199
|
+
if (normalizedAction === 'tts-pipeline') {
|
|
2200
|
+
const result = await runLocalTtsPipelineDebugCheck(debugOptions);
|
|
2201
|
+
printTtsPipelineDebugResult(result, isTruthyFlag(flags.json));
|
|
2202
|
+
if (!result.backend?.success) {
|
|
2203
|
+
throw new Error(result.backend?.qwen?.errorCode || 'Local backend TTS replay failed.');
|
|
2204
|
+
}
|
|
2205
|
+
return;
|
|
2206
|
+
}
|
|
2207
|
+
|
|
2208
|
+
throw new Error('Unknown debug action: ' + normalizedAction + '. Use: oomi openclaw debug assistant-final|tts-pipeline');
|
|
2209
|
+
}
|
|
1756
2210
|
|
|
1757
2211
|
function extractCorrelationId(params) {
|
|
1758
2212
|
if (!params || typeof params !== 'object') return '';
|
|
@@ -2987,18 +3441,17 @@ async function startOpenclawBridge(flags) {
|
|
|
2987
3441
|
|
|
2988
3442
|
gatewaySocket.on('message', runBridgeCallbackSafely((gatewayRaw) => {
|
|
2989
3443
|
let frame = typeof gatewayRaw === 'string' ? gatewayRaw : gatewayRaw.toString();
|
|
2990
|
-
|
|
2991
|
-
|
|
2992
|
-
|
|
2993
|
-
if (spokenNormalized.
|
|
2994
|
-
frame = spokenNormalized.frameText;
|
|
3444
|
+
const spokenNormalized = normalizeAssistantGatewayFrame(sessionId, frame);
|
|
3445
|
+
if (spokenNormalized.changed) {
|
|
3446
|
+
frame = spokenNormalized.frameText;
|
|
3447
|
+
if (spokenNormalized.scope === 'voice') {
|
|
2995
3448
|
console.log(`[bridge] voice.spoken_metadata.${spokenNormalized.reason} ${sessionId} ${JSON.stringify({
|
|
2996
|
-
before:
|
|
3449
|
+
before: spokenNormalized.summary,
|
|
2997
3450
|
after: summarizeVoiceFrameContract(frame),
|
|
2998
3451
|
})}`);
|
|
2999
|
-
} else if (beforeSummary.event === 'chat' && beforeSummary.state === 'final') {
|
|
3000
|
-
console.log(`[bridge] voice.chat.final ${sessionId} ${JSON.stringify(beforeSummary)}`);
|
|
3001
3452
|
}
|
|
3453
|
+
} else if (spokenNormalized.scope === 'voice' && spokenNormalized.summary.event === 'chat' && spokenNormalized.summary.state === 'final') {
|
|
3454
|
+
console.log(`[bridge] voice.chat.final ${sessionId} ${JSON.stringify(spokenNormalized.summary)}`);
|
|
3002
3455
|
}
|
|
3003
3456
|
const gatewayPayload = parseJsonPayload(frame);
|
|
3004
3457
|
if (gatewayPayload?.event === 'connect.challenge') {
|
|
@@ -4139,10 +4592,15 @@ async function main() {
|
|
|
4139
4592
|
return;
|
|
4140
4593
|
}
|
|
4141
4594
|
|
|
4142
|
-
if (command === 'openclaw' && subcommand === 'plugin') {
|
|
4143
|
-
printOpenclawPluginSetup(args.flags);
|
|
4144
|
-
return;
|
|
4145
|
-
}
|
|
4595
|
+
if (command === 'openclaw' && subcommand === 'plugin') {
|
|
4596
|
+
printOpenclawPluginSetup(args.flags);
|
|
4597
|
+
return;
|
|
4598
|
+
}
|
|
4599
|
+
|
|
4600
|
+
if (command === 'openclaw' && subcommand === 'debug') {
|
|
4601
|
+
await handleOpenclawDebugCommand(args.positionals[0], args.flags);
|
|
4602
|
+
return;
|
|
4603
|
+
}
|
|
4146
4604
|
|
|
4147
4605
|
if (command === 'personas' && subcommand === 'sync') {
|
|
4148
4606
|
await syncPersonas({ backendUrl: args.flags['backend-url'], root: args.flags.root });
|
|
@@ -4257,7 +4715,9 @@ if (__isDirectExecution) {
|
|
|
4257
4715
|
|
|
4258
4716
|
export {
|
|
4259
4717
|
prepareGatewayFrameForLocalGateway,
|
|
4260
|
-
|
|
4718
|
+
ensureAssistantSpokenMetadata,
|
|
4719
|
+
normalizeAssistantGatewayFrame,
|
|
4720
|
+
runAssistantFinalDebugCheck,
|
|
4261
4721
|
buildBridgeLaunchAgentPlist,
|
|
4262
4722
|
classifyBridgeFailure,
|
|
4263
4723
|
classifyBridgeSessionScope,
|
package/lib/spokenMetadata.js
CHANGED
|
@@ -100,15 +100,13 @@ function splitSpeechSegments(text) {
|
|
|
100
100
|
.split(/(?<=[,;:])\s+/)
|
|
101
101
|
.map((part) => part.trim())
|
|
102
102
|
.filter(Boolean);
|
|
103
|
-
|
|
104
|
-
if (clauseParts.length > 1) {
|
|
105
|
-
for (
|
|
106
|
-
|
|
107
|
-
|
|
108
|
-
|
|
109
|
-
|
|
110
|
-
continue;
|
|
111
|
-
}
|
|
103
|
+
|
|
104
|
+
if (clauseParts.length > 1) {
|
|
105
|
+
for (const part of clauseParts) {
|
|
106
|
+
segments.push(part);
|
|
107
|
+
}
|
|
108
|
+
continue;
|
|
109
|
+
}
|
|
112
110
|
|
|
113
111
|
segments.push(segment);
|
|
114
112
|
}
|
|
@@ -230,10 +228,10 @@ function normalizeSpokenMetadata(spoken) {
|
|
|
230
228
|
return normalized;
|
|
231
229
|
}
|
|
232
230
|
|
|
233
|
-
function inferSpokenMetadataFromContent(content) {
|
|
234
|
-
const text = normalizeSpeechText(trimString(content));
|
|
235
|
-
if (!text) return null;
|
|
236
|
-
const synthesized = synthesizeSpokenSegments(text);
|
|
231
|
+
function inferSpokenMetadataFromContent(content) {
|
|
232
|
+
const text = normalizeSpeechText(trimString(content));
|
|
233
|
+
if (!text) return null;
|
|
234
|
+
const synthesized = synthesizeSpokenSegments(text);
|
|
237
235
|
|
|
238
236
|
const normalized = text.toLowerCase();
|
|
239
237
|
const upbeat =
|
|
@@ -243,44 +241,44 @@ function inferSpokenMetadataFromContent(content) {
|
|
|
243
241
|
/\b(sorry|gentle|softly|careful|reassuring|calm|okay|it'?s okay|i know)\b/.test(normalized);
|
|
244
242
|
const curious = /\?/.test(text);
|
|
245
243
|
|
|
246
|
-
if (upbeat) {
|
|
247
|
-
return {
|
|
248
|
-
text,
|
|
249
|
-
language: synthesized?.language || 'English',
|
|
250
|
-
segments: synthesized?.segments,
|
|
251
|
-
instructions: 'Speak with warm, upbeat conversational energy and natural pacing.',
|
|
252
|
-
style: { emotion: 'upbeat', energy: 'medium' },
|
|
253
|
-
};
|
|
254
|
-
}
|
|
255
|
-
|
|
256
|
-
if (gentle) {
|
|
257
|
-
return {
|
|
258
|
-
text,
|
|
259
|
-
language: synthesized?.language || 'English',
|
|
260
|
-
segments: synthesized?.segments,
|
|
261
|
-
instructions: 'Speak gently and reassuringly, with a calm pace and soft emphasis.',
|
|
262
|
-
style: { emotion: 'gentle', energy: 'low' },
|
|
263
|
-
};
|
|
264
|
-
}
|
|
265
|
-
|
|
266
|
-
if (curious) {
|
|
267
|
-
return {
|
|
268
|
-
text,
|
|
269
|
-
language: synthesized?.language || 'English',
|
|
270
|
-
segments: synthesized?.segments,
|
|
271
|
-
instructions: 'Speak naturally with curious, engaged intonation and a conversational pace.',
|
|
272
|
-
style: { emotion: 'curious', energy: 'medium' },
|
|
273
|
-
};
|
|
274
|
-
}
|
|
275
|
-
|
|
276
|
-
return {
|
|
277
|
-
text,
|
|
278
|
-
language: synthesized?.language || 'English',
|
|
279
|
-
segments: synthesized?.segments,
|
|
280
|
-
instructions: 'Speak naturally with light warmth and conversational pacing.',
|
|
281
|
-
style: { emotion: 'neutral', energy: 'medium' },
|
|
282
|
-
};
|
|
283
|
-
}
|
|
244
|
+
if (upbeat) {
|
|
245
|
+
return normalizeSpokenMetadata({
|
|
246
|
+
text,
|
|
247
|
+
language: synthesized?.language || 'English',
|
|
248
|
+
segments: synthesized?.segments,
|
|
249
|
+
instructions: 'Speak with warm, upbeat conversational energy and natural pacing.',
|
|
250
|
+
style: { emotion: 'upbeat', energy: 'medium' },
|
|
251
|
+
});
|
|
252
|
+
}
|
|
253
|
+
|
|
254
|
+
if (gentle) {
|
|
255
|
+
return normalizeSpokenMetadata({
|
|
256
|
+
text,
|
|
257
|
+
language: synthesized?.language || 'English',
|
|
258
|
+
segments: synthesized?.segments,
|
|
259
|
+
instructions: 'Speak gently and reassuringly, with a calm pace and soft emphasis.',
|
|
260
|
+
style: { emotion: 'gentle', energy: 'low' },
|
|
261
|
+
});
|
|
262
|
+
}
|
|
263
|
+
|
|
264
|
+
if (curious) {
|
|
265
|
+
return normalizeSpokenMetadata({
|
|
266
|
+
text,
|
|
267
|
+
language: synthesized?.language || 'English',
|
|
268
|
+
segments: synthesized?.segments,
|
|
269
|
+
instructions: 'Speak naturally with curious, engaged intonation and a conversational pace.',
|
|
270
|
+
style: { emotion: 'curious', energy: 'medium' },
|
|
271
|
+
});
|
|
272
|
+
}
|
|
273
|
+
|
|
274
|
+
return normalizeSpokenMetadata({
|
|
275
|
+
text,
|
|
276
|
+
language: synthesized?.language || 'English',
|
|
277
|
+
segments: synthesized?.segments,
|
|
278
|
+
instructions: 'Speak naturally with light warmth and conversational pacing.',
|
|
279
|
+
style: { emotion: 'neutral', energy: 'medium' },
|
|
280
|
+
});
|
|
281
|
+
}
|
|
284
282
|
|
|
285
283
|
export {
|
|
286
284
|
inferSpokenMetadataFromContent,
|
package/openclaw.plugin.json
CHANGED