npm - oomi-ai - Versions diffs - 0.2.24 → 0.2.27 - Mend

oomi-ai 0.2.24 → 0.2.27

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (5) hide show

package/README.md CHANGED Viewed

@@ -4,12 +4,13 @@ OpenClaw channel plugin and bridge tooling for Oomi managed chat and voice.
 ## Current Focus
-`0.2.21` adds the first live persona automation lane:
+`0.2.27` keeps the persona automation lane and adds a usable local managed-voice validation path:
 - WebSpatial-based persona scaffolding for generated Oomi apps
 - a high-level `oomi personas create-managed` command for agent-driven persona creation
 - device-authenticated persona runtime registration and job callbacks
 - automatic bridge-side polling for queued `persona_job` control messages
-- end-to-end local persona startup from a structured orchestration payload
+- one shared spoken-metadata normalizer used by both the extension and the bridge
+- a repo-backed local `tts-pipeline` replay that can validate assistant-final -> backend -> real Qwen TTS before publishing
 This package is for two audiences:
 - OpenClaw operators who need to connect a machine to Oomi and keep chat or voice healthy
@@ -148,6 +149,46 @@ For managed cloned-voice replies, the canonical contract is:
 The backend cloned-voice path is intentionally strict. If `metadata.spoken` does not reach Oomi, backend TTS fails instead of speaking a flat fallback voice.
+## Local TTS Validation
+If you are developing this package inside the Oomi repo, you can now validate the managed voice path locally before publishing.
+This local gate does three things:
+- replays an assistant `chat.final` frame through the same spoken-metadata normalization path used by the OpenClaw extension and the bridge
+- feeds that normalized frame into the Rails backend replay harness
+- optionally calls the real Qwen cloned-voice provider and confirms that audio deltas come back
+Important:
+- this is a repo developer workflow, not a generic npm-only operator command
+- it expects the Oomi repo checkout, the Rails backend, and local provider env vars
+- the real-provider replay can auto-enroll a disposable default sample voice profile from `assets/voice/source/nemu-enrollment-sample.mp3`
+Assistant-final contract only:
+```bash
+oomi openclaw debug assistant-final --text "Hey Justin! How is the testing going?" --json
+```
+Full local backend replay:
+```bash
+oomi openclaw debug tts-pipeline --text "When your voice reaches me, it gets turned into text, I read it and think about it, then I speak back through the managed chat session." --json
+```
+Real Qwen provider replay:
+```bash
+oomi openclaw debug tts-pipeline --text "When your voice reaches me, it gets turned into text, I read it and think about it, then I speak back through the managed chat session." --live-provider --env-file .env.local --provider-timeout-ms 20000 --json
+```
+What a good result looks like:
+- `backend.success = true`
+- `managed.assistantSpeechFinal.present = true`
+- `qwen.errorCode = null`
+- `qwen.audioDeltaCount > 0` when `--live-provider` is used
+This is the preferred pre-publish gate for managed voice regressions, because it is much faster than publishing to npm and testing through a live OpenClaw machine first.
 ## Persona Scaffolding
 Use the scaffold flow when OpenClaw needs to build a managed persona app that will live inside Oomi:
@@ -192,7 +233,7 @@ The bridge status file is written locally and should roughly be interpreted as:
 For voice support, a `voice_session_*` failure should be treated as narrower than a full provider outage.
-## Troubleshooting
+## Troubleshooting
 ### `invalid handshake: first request must be connect`
@@ -223,19 +264,32 @@ What to check:
 If the process is alive but runtime faults are being caught, expect `degraded` rather than an immediate hard stop.
-### Voice STT works but the agent does not answer
+### Voice STT works but the agent does not answer
 This usually means one of these:
 - the managed gateway/device side is not actually ready
 - the bridge or agent run failed after delivery
 - the OpenClaw run stopped with an upstream provider `network_error`
-In that situation, inspect:
-- `~/.openclaw/logs/gateway.log`
-- `~/.openclaw/logs/gateway.err.log`
-- the relevant session JSONL in `~/.openclaw/agents/main/sessions/`
+In that situation, inspect:
+- `~/.openclaw/logs/gateway.log`
+- `~/.openclaw/logs/gateway.err.log`
+- the relevant session JSONL in `~/.openclaw/agents/main/sessions/`
+### Voice text works but cloned TTS fails with `MISSING_SPOKEN_METADATA`
+Meaning:
+- the assistant text arrived
+- the backend voice relay never received valid hidden `metadata.spoken`
+What to check:
+- run the local replay gate before publishing:
+  - `oomi openclaw debug assistant-final --text "..."`
+  - `oomi openclaw debug tts-pipeline --text "..."`
+- if the package local replay succeeds but the live machine fails, verify the OpenClaw machine is actually running the updated bridge binary
+- if the local replay fails, fix the assistant-final contract first instead of debugging the browser or backend deployment
-## Developer Notes
+## Developer Notes
 If you are inspecting this package on npm, the main architectural points are:
 - the extension path is the stable managed text contract
@@ -248,24 +302,38 @@ If you are inspecting this package on npm, the main architectural points are:
   - runtime fault isolation so local session failures are less likely to crash the whole provider
   - one shared hidden managed-voice speech metadata helper used by both the extension and the local bridge
-If you are developing the plugin, test the packaged surface with:
-```bash
-cd packages/oomi-ai
-node --test test/*.test.mjs
-npm pack --dry-run
-```
-## Release Process
-Before publishing:
-```bash
-cd packages/oomi-ai
-node --test test/*.test.mjs
-npm pack --dry-run
-```
+If you are developing the plugin, test the packaged surface with:
+```bash
+cd packages/oomi-ai
+node --test test/*.test.mjs
+npm pack --dry-run
+```
+For managed voice changes, do not stop at the package tests. Run the local replay gate from the repo root as well, especially before publishing:
+```bash
+oomi openclaw debug tts-pipeline --text "Local managed voice validation text." --json
+oomi openclaw debug tts-pipeline --text "Local managed voice validation text." --live-provider --env-file .env.local --provider-timeout-ms 20000 --json
+```
+## Release Process
+Before publishing:
+```bash
+cd packages/oomi-ai
+node --test test/*.test.mjs
+npm pack --dry-run
+```
+For voice-related changes, also run the repo-backed local replay gate before publish:
+```bash
+oomi openclaw debug tts-pipeline --text "Local managed voice validation text." --json
+oomi openclaw debug tts-pipeline --text "Local managed voice validation text." --live-provider --env-file .env.local --provider-timeout-ms 20000 --json
+```
 Then publish the bumped version:
 ```bash

package/bin/oomi-ai.js CHANGED Viewed

@@ -39,13 +39,21 @@ const BRIDGE_CONNECT_CHALLENGE_TIMEOUT_MS = parsePositiveInteger(
   process.env.OOMI_BRIDGE_CONNECT_CHALLENGE_TIMEOUT_MS,
   3000
 );
-const BRIDGE_GATEWAY_REQUEST_TIMEOUT_MS = parsePositiveInteger(
-  process.env.OOMI_BRIDGE_GATEWAY_REQUEST_TIMEOUT_MS,
-  30000
-);
-const BRIDGE_LAUNCHD_LABEL = 'ai.oomi.bridge';
-const DEVICE_IDENTITY_PATH = path.join(os.homedir(), '.openclaw', 'identity', 'device.json');
-const ED25519_SPKI_PREFIX = Buffer.from('302a300506032b6570032100', 'hex');
+const BRIDGE_GATEWAY_REQUEST_TIMEOUT_MS = parsePositiveInteger(
+  process.env.OOMI_BRIDGE_GATEWAY_REQUEST_TIMEOUT_MS,
+  30000
+);
+const BRIDGE_LAUNCHD_LABEL = 'ai.oomi.bridge';
+const DEBUG_PROVIDER_ENV_KEYS = [
+  'QWEN_REALTIME_API_KEY',
+  'QWEN_REALTIME_BASE_URL',
+  'QWEN_REALTIME_ASR_MODEL',
+  'QWEN_REALTIME_TTS_MODEL',
+  'QWEN_REALTIME_TTS_VOICE',
+  'QWEN_REALTIME_LANGUAGE',
+];
+const DEVICE_IDENTITY_PATH = path.join(os.homedir(), '.openclaw', 'identity', 'device.json');
+const ED25519_SPKI_PREFIX = Buffer.from('302a300506032b6570032100', 'hex');
 function parsePositiveInteger(value, fallback) {
   const num = Number(value);
@@ -169,10 +177,14 @@ Commands:
   openclaw install
     Install agent instructions and the Oomi skill into OpenClaw.
-  openclaw bridge [start|ensure|stop|restart|ps]
-    Manage local OpenClaw-to-Oomi bridge lifecycle (singleton).
-  openclaw bridge service [install|start|stop|restart|status|uninstall]
-    Manage macOS launchd bridge supervision.
+  openclaw bridge [start|ensure|stop|restart|ps]
+    Manage local OpenClaw-to-Oomi bridge lifecycle (singleton).
+  openclaw bridge service [install|start|stop|restart|status|uninstall]
+    Manage macOS launchd bridge supervision.
+  openclaw debug assistant-final
+    Replay an assistant chat.final frame through spoken-metadata normalization.
+  openclaw debug tts-pipeline
+    Replay an assistant chat.final through local backend voice handling.
   openclaw pair
     Pair this OpenClaw host with Oomi and start bridge (single command).
@@ -225,9 +237,20 @@ Common flags:
   --device-id ID         Bridge device identifier (default: host name)
   --device-token TOKEN   Existing bridge device token
   --show-secrets         Print full token values in diagnostic output
-  --json                 Print pairing result as JSON (for automation)
-  --backend-url URL      Override Oomi backend URL
-  --root PATH            Override repo root path for persona discovery
+  --json                 Print pairing result as JSON (for automation)
+  --text TEXT            Assistant text for local debug frame replay
+  --frame-file PATH      Read a raw gateway frame from disk for local debug replay
+  --frame-json JSON      Use raw gateway frame JSON text for local debug replay
+  --session-id ID        Debug session id override (default: ms_debug_local)
+  --user-text TEXT       User utterance text used for backend voice replay
+  --live-provider        Use the real Qwen TTS provider in local debug replay
+  --env-file PATH        Load provider env vars from a specific env file (default: <repo>/.env.local)
+  --provider-timeout-ms N
+                        Timeout in ms for live provider audio during local debug replay
+  --backend-url URL      Override Oomi backend URL
+  --root PATH            Override repo root path for persona discovery
+  --role ROLE            Message role override for local debug frame replay
+  --omit-role            Omit message.role in the generated local debug frame
   --name NAME            Persona display name (for create)
   --description TEXT     Persona description (for scaffold)
   --slug SLUG            Explicit slug override (for create-managed)
@@ -261,13 +284,43 @@ function readFile(filePath) {
   return fs.readFileSync(filePath, 'utf-8');
 }
-function writeFile(filePath, content, options = undefined) {
-  fs.writeFileSync(filePath, content, options);
-}
-function xmlEscape(value) {
-  return String(value)
-    .replaceAll('&', '&amp;')
+function writeFile(filePath, content, options = undefined) {
+  fs.writeFileSync(filePath, content, options);
+}
+function parseDotEnvLine(line) {
+  const trimmed = String(line || '').trim();
+  if (!trimmed || trimmed.startsWith('#')) return null;
+  const separatorIndex = trimmed.indexOf('=');
+  if (separatorIndex <= 0) return null;
+  const key = trimmed.slice(0, separatorIndex).trim();
+  if (!key) return null;
+  let value = trimmed.slice(separatorIndex + 1).trim();
+  if ((value.startsWith('"') && value.endsWith('"')) || (value.startsWith("'") && value.endsWith("'"))) {
+    value = value.slice(1, -1);
+  }
+  return { key, value };
+}
+function loadEnvFile(filePath, keys = []) {
+  if (!filePath || !fs.existsSync(filePath)) {
+    throw new Error(`Environment file not found: ${filePath}`);
+  }
+  const selectedKeys = Array.isArray(keys) && keys.length ? new Set(keys) : null;
+  const entries = {};
+  const lines = readFile(filePath).split(/\r?\n/);
+  for (const line of lines) {
+    const parsed = parseDotEnvLine(line);
+    if (!parsed) continue;
+    if (selectedKeys && !selectedKeys.has(parsed.key)) continue;
+    entries[parsed.key] = parsed.value;
+  }
+  return entries;
+}
+function xmlEscape(value) {
+  return String(value)
+    .replaceAll('&', '&amp;')
     .replaceAll('<', '&lt;')
     .replaceAll('>', '&gt;')
     .replaceAll('"', '&quot;')
@@ -356,9 +409,9 @@ function ensureDir(dirPath) {
   }
 }
-function findRepoRoot(startDir) {
-  let current = startDir;
-  for (let i = 0; i < 6; i += 1) {
+function findRepoRoot(startDir) {
+  let current = startDir;
+  for (let i = 0; i < 6; i += 1) {
     const personasDir = path.join(current, 'personas');
     const skillsDir = path.join(current, 'skills', 'oomi');
     if (fs.existsSync(personasDir) || fs.existsSync(skillsDir)) {
@@ -367,11 +420,23 @@ function findRepoRoot(startDir) {
     const parent = path.dirname(current);
     if (parent === current) break;
     current = parent;
-  }
-  return null;
-}
-function resolveSkillSource(cliRoot) {
+  }
+  return null;
+}
+function resolveRepoRoot(rootFlag) {
+  const explicitRoot =
+    typeof rootFlag === 'string' && rootFlag.trim()
+      ? path.resolve(rootFlag.trim())
+      : '';
+  const repoRoot = explicitRoot || findRepoRoot(process.cwd()) || findRepoRoot(PACKAGE_ROOT);
+  if (!repoRoot) {
+    throw new Error('Could not locate repo root. Use --root <repo root>.');
+  }
+  return repoRoot;
+}
+function resolveSkillSource(cliRoot) {
   const packaged = path.join(PACKAGE_ROOT, 'skills', 'oomi');
   if (fs.existsSync(packaged)) {
     return packaged;
@@ -1698,7 +1763,7 @@ function summarizeVoiceFrameContract(frameText) {
   };
 }
-function ensureVoiceAssistantSpokenMetadata(frameText) {
+function ensureAssistantSpokenMetadata(frameText) {
   const frame = parseJsonPayload(frameText);
   if (!frame || typeof frame !== 'object') {
     return { frameText, changed: false, reason: '' };
@@ -1753,6 +1818,395 @@ function ensureVoiceAssistantSpokenMetadata(frameText) {
     reason: normalizedExplicitSpoken ? 'normalized' : (messageRole ? 'synthesized' : 'synthesized_missing_role'),
   };
 }
+function normalizeAssistantGatewayFrame(sessionId, frameText) {
+  const scope = classifyBridgeSessionScope(sessionId);
+  const summary = summarizeVoiceFrameContract(frameText);
+  if (!summary.parseable || summary.event !== 'chat' || summary.state !== 'final') {
+    return {
+      frameText,
+      changed: false,
+      reason: '',
+      scope,
+      summary,
+    };
+  }
+  const normalized = ensureAssistantSpokenMetadata(frameText);
+  return {
+    ...normalized,
+    scope,
+    summary,
+  };
+}
+function buildAssistantFinalDebugFrame({ sessionKey, text, role }) {
+  const trimmedSessionKey =
+    typeof sessionKey === 'string' && sessionKey.trim()
+      ? sessionKey.trim()
+      : 'agent:main:webchat:channel:oomi';
+  const message = {
+    content: String(text || ''),
+  };
+  if (typeof role === 'string' && role.trim()) {
+    message.role = role.trim();
+  }
+  return JSON.stringify({
+    type: 'event',
+    event: 'chat',
+    payload: {
+      sessionKey: trimmedSessionKey,
+      state: 'final',
+      message,
+    },
+  });
+}
+function extractSpokenMetadata(frameText) {
+  const payload = parseJsonPayload(frameText);
+  const message =
+    payload &&
+    payload.payload &&
+    typeof payload.payload === 'object' &&
+    payload.payload.message &&
+    typeof payload.payload.message === 'object'
+      ? payload.payload.message
+      : null;
+  const metadata =
+    message &&
+    message.metadata &&
+    typeof message.metadata === 'object' &&
+    !Array.isArray(message.metadata)
+      ? message.metadata
+      : {};
+  return normalizeSpokenMetadata(metadata.spoken);
+}
+function runAssistantFinalDebugCheck(options = {}) {
+  const sessionId =
+    typeof options.sessionId === 'string' && options.sessionId.trim()
+      ? options.sessionId.trim()
+      : 'ms_debug_local';
+  const sessionKey =
+    typeof options.sessionKey === 'string' && options.sessionKey.trim()
+      ? options.sessionKey.trim()
+      : 'agent:main:webchat:channel:oomi';
+  const role =
+    options.omitRole
+      ? ''
+      : (typeof options.role === 'string' && options.role.trim() ? options.role.trim() : 'assistant');
+  const rawFrameText =
+    typeof options.frameText === 'string' && options.frameText.trim()
+      ? options.frameText
+      : buildAssistantFinalDebugFrame({
+          sessionKey,
+          text: options.text,
+          role,
+        });
+  const before = summarizeVoiceFrameContract(rawFrameText);
+  const normalized = normalizeAssistantGatewayFrame(sessionId, rawFrameText);
+  const after = summarizeVoiceFrameContract(normalized.frameText);
+  const spoken = extractSpokenMetadata(normalized.frameText);
+  return {
+    sessionId,
+    sessionKey,
+    scope: normalized.scope,
+    changed: normalized.changed,
+    reason: normalized.reason,
+    before,
+    after,
+    spoken,
+    frameText: normalized.frameText,
+  };
+}
+function printAssistantFinalDebugResult(result, asJson) {
+  if (asJson) {
+    console.log(JSON.stringify(result, null, 2));
+    return;
+  }
+  console.log(`Session id: ${result.sessionId}`);
+  console.log(`Session key: ${result.sessionKey}`);
+  console.log(`Scope: ${result.scope}`);
+  console.log(`Changed: ${result.changed ? 'yes' : 'no'}${result.reason ? ` (${result.reason})` : ''}`);
+  console.log(
+    `Before: event=${result.before.event || '<none>'} state=${result.before.state || '<none>'} role=${result.before.role || '<none>'} spoken=${result.before.spokenNormalized ? 'yes' : 'no'}`
+  );
+  console.log(
+    `After: event=${result.after.event || '<none>'} state=${result.after.state || '<none>'} role=${result.after.role || '<none>'} spoken=${result.after.spokenNormalized ? 'yes' : 'no'}`
+  );
+  if (result.spoken) {
+    console.log(`Spoken text: ${result.spoken.text}`);
+    console.log(`Segments: ${Array.isArray(result.spoken.segments) ? result.spoken.segments.length : 0}`);
+    if (typeof result.spoken.instructions === 'string' && result.spoken.instructions.trim()) {
+      console.log(`Instructions: ${result.spoken.instructions}`);
+    }
+  } else {
+    console.log('Spoken text: <missing>');
+  }
+}
+function resolveCommandFromPath(commandName) {
+  const normalized = String(commandName || '').trim();
+  if (!normalized) return '';
+  try {
+    const probe = spawnSync(process.platform === 'win32' ? 'where' : 'which', [normalized], {
+      encoding: 'utf8',
+      stdio: ['ignore', 'pipe', 'ignore'],
+    });
+    if (probe.status !== 0) return '';
+    const firstLine = String(probe.stdout || '')
+      .split(/\r?\n/)
+      .map((line) => line.trim())
+      .find(Boolean);
+    return firstLine || '';
+  } catch {
+    return '';
+  }
+}
+function resolveExecutable(candidates = []) {
+  for (const candidate of candidates) {
+    if (!candidate) continue;
+    const value = String(candidate).trim();
+    if (!value) continue;
+    if (path.isAbsolute(value) && fs.existsSync(value)) {
+      return value;
+    }
+    if (value.includes(path.sep) || value.includes('/')) {
+      const resolved = path.resolve(value);
+      if (fs.existsSync(resolved)) {
+        return resolved;
+      }
+      continue;
+    }
+    const fromPath = resolveCommandFromPath(value);
+    if (fromPath) {
+      return fromPath;
+    }
+  }
+  return '';
+}
+function resolveBackendRoot(rootFlag) {
+  const repoRoot = resolveRepoRoot(rootFlag);
+  const backendRoot = path.join(repoRoot, 'apps', 'backend');
+  if (!fs.existsSync(backendRoot)) {
+    throw new Error(`Could not locate backend app at ${backendRoot}`);
+  }
+  return backendRoot;
+}
+function resolveRubyExecutable() {
+  const candidates = [
+    process.env.OOMI_RUBY_BIN,
+    process.env.RUBY,
+    process.platform === 'win32' ? 'ruby.exe' : 'ruby',
+    process.platform === 'win32' ? 'ruby' : '',
+    process.platform === 'win32' ? 'C:\\Ruby33-x64\\bin\\ruby.exe' : '',
+  ];
+  const executable = resolveExecutable(candidates);
+  if (!executable) {
+    throw new Error('Ruby executable not found. Set OOMI_RUBY_BIN or install Ruby locally.');
+  }
+  return executable;
+}
+function resolveBundleExecutable() {
+  const candidates = [
+    process.env.OOMI_BUNDLE_BIN,
+    process.platform === 'win32' ? 'bundle.bat' : 'bundle',
+    'bundle',
+    process.platform === 'win32' ? 'C:\\Ruby33-x64\\bin\\bundle.bat' : '',
+  ];
+  const executable = resolveExecutable(candidates);
+  if (!executable) {
+    throw new Error('Bundler executable not found. Set OOMI_BUNDLE_BIN or install Bundler locally.');
+  }
+  return executable;
+}
+function shellQuote(value) {
+  const text = String(value);
+  if (process.platform === 'win32') {
+    return `"${text.replace(/"/g, '""')}"`;
+  }
+  return `'${text.replace(/'/g, `'\\''`)}'`;
+}
+async function runBundledRubyScript({ backendRoot, scriptPath, inputFile, env = undefined }) {
+  const rubyExecutable = resolveRubyExecutable();
+  const bundleExecutable = resolveBundleExecutable();
+  const commandText = process.platform === 'win32'
+    ? [bundleExecutable, 'exec', rubyExecutable, scriptPath, '--input-file', inputFile].map(shellQuote).join(' ')
+    : '';
+  const childEnv = env ? { ...process.env, ...env } : process.env;
+  return await new Promise((resolve, reject) => {
+    const child = process.platform === 'win32'
+      ? spawn(commandText, [], {
+          cwd: backendRoot,
+          shell: true,
+          env: childEnv,
+          stdio: ['ignore', 'pipe', 'pipe'],
+        })
+      : spawn(bundleExecutable, ['exec', rubyExecutable, scriptPath, '--input-file', inputFile], {
+          cwd: backendRoot,
+          env: childEnv,
+          stdio: ['ignore', 'pipe', 'pipe'],
+        });
+    let stdout = '';
+    let stderr = '';
+    child.stdout.on('data', (chunk) => {
+      stdout += chunk.toString();
+    });
+    child.stderr.on('data', (chunk) => {
+      stderr += chunk.toString();
+    });
+    child.on('error', reject);
+    child.on('close', (code) => {
+      resolve({ code: Number(code || 0), stdout, stderr });
+    });
+  });
+}
+async function runLocalTtsPipelineDebugCheck(options = {}) {
+  const assistant = runAssistantFinalDebugCheck(options);
+  const repoRoot = resolveRepoRoot(options.root);
+  const backendRoot = resolveBackendRoot(options.root);
+  const scriptPath = path.join(backendRoot, 'bin', 'voice_tts_replay.rb');
+  if (!fs.existsSync(scriptPath)) {
+    throw new Error(`Backend replay script not found: ${scriptPath}`);
+  }
+  const inputPayload = {
+    repoRoot,
+    sessionId: assistant.sessionId,
+    sessionKey: assistant.sessionKey,
+    frameText: assistant.frameText,
+    userText:
+      typeof options.userText === 'string' && options.userText.trim()
+        ? options.userText.trim()
+        : 'local debug utterance',
+    liveProvider: Boolean(options.liveProvider),
+    providerTimeoutMs: parsePositiveInteger(options.providerTimeoutMs, 15000),
+  };
+  let childEnv = undefined;
+  let resolvedEnvFile = '';
+  if (options.liveProvider) {
+    resolvedEnvFile =
+      typeof options.envFile === 'string' && options.envFile.trim()
+        ? path.resolve(options.envFile.trim())
+        : path.join(repoRoot, '.env.local');
+    childEnv = loadEnvFile(resolvedEnvFile, DEBUG_PROVIDER_ENV_KEYS);
+  }
+  const inputFile = path.join(os.tmpdir(), `oomi-voice-replay-${randomUUID()}.json`);
+  writeFile(inputFile, JSON.stringify(inputPayload, null, 2) + '\n');
+  try {
+    const backend = await runBundledRubyScript({ backendRoot, scriptPath, inputFile, env: childEnv });
+    const parsed = backend.stdout.trim() ? JSON.parse(backend.stdout) : null;
+    return {
+      assistant,
+      backend: parsed,
+      backendExitCode: backend.code,
+      backendStderr: backend.stderr.trim(),
+      liveProvider: Boolean(options.liveProvider),
+      envFile: resolvedEnvFile || null,
+    };
+  } finally {
+    try {
+      fs.unlinkSync(inputFile);
+    } catch {
+      // no-op
+    }
+  }
+}
+function printTtsPipelineDebugResult(result, asJson) {
+  if (asJson) {
+    console.log(JSON.stringify(result, null, 2));
+    return;
+  }
+  console.log(`Assistant normalization: ${result.assistant.changed ? 'changed' : 'unchanged'}${result.assistant.reason ? ` (${result.assistant.reason})` : ''}`);
+  console.log(`Assistant spoken segments: ${Array.isArray(result.assistant.spoken?.segments) ? result.assistant.spoken.segments.length : 0}`);
+  if (!result.backend) {
+    console.log('Backend replay: <no output>');
+    return;
+  }
+  console.log(`Backend replay success: ${result.backend.success ? 'yes' : 'no'}`);
+  console.log(`Managed speech sidecar: ${result.backend.managed?.assistantSpeechFinal?.present ? 'yes' : 'no'}`);
+  console.log(`Backend final text: ${result.backend.qwen?.assistantTextFinal || '<missing>'}`);
+  console.log(`Backend TTS appends: ${Array.isArray(result.backend.qwen?.ttsAppends) ? result.backend.qwen.ttsAppends.length : 0}`);
+  console.log(`Backend TTS commits: ${Number(result.backend.qwen?.commitCount || 0)}`);
+  if (result.liveProvider) {
+    console.log(`Live provider audio deltas: ${Number(result.backend.qwen?.audioDeltaCount || 0)}`);
+    console.log(`Live provider audio bytes (base64): ${Number(result.backend.qwen?.audioDeltaBytes || 0)}`);
+    console.log(`Live provider timeout: ${result.backend.qwen?.providerTimedOut ? 'yes' : 'no'}`);
+  }
+  if (result.backend.qwen?.errorCode) {
+    console.log(`Backend error: ${result.backend.qwen.errorCode}`);
+  }
+  if (result.backendStderr) {
+    console.log(`Backend stderr: ${result.backendStderr}`);
+  }
+}
+async function handleOpenclawDebugCommand(action, flags) {
+  const normalizedAction = String(action || '').trim().toLowerCase();
+  const frameFile =
+    typeof flags['frame-file'] === 'string' && flags['frame-file'].trim()
+      ? path.resolve(flags['frame-file'])
+      : '';
+  const frameText =
+    frameFile
+      ? readFile(frameFile)
+      : (typeof flags['frame-json'] === 'string' && flags['frame-json'].trim() ? flags['frame-json'] : '');
+  const text = typeof flags.text === 'string' ? flags.text : '';
+  if (!frameText && !text.trim()) {
+    throw new Error(
+      'Assistant text or frame input is required. Usage: oomi openclaw debug assistant-final --text "<assistant text>"'
+    );
+  }
+  const debugOptions = {
+    sessionId: flags['session-id'],
+    sessionKey: flags['session-key'],
+    role: flags.role,
+    omitRole: isTruthyFlag(flags['omit-role']),
+    text,
+    frameText,
+    root: flags.root,
+    userText: flags['user-text'],
+    liveProvider: isTruthyFlag(flags['live-provider']),
+    envFile: flags['env-file'],
+    providerTimeoutMs: flags['provider-timeout-ms'],
+  };
+  if (normalizedAction === 'assistant-final') {
+    const result = runAssistantFinalDebugCheck(debugOptions);
+    printAssistantFinalDebugResult(result, isTruthyFlag(flags.json));
+    return;
+  }
+  if (normalizedAction === 'tts-pipeline') {
+    const result = await runLocalTtsPipelineDebugCheck(debugOptions);
+    printTtsPipelineDebugResult(result, isTruthyFlag(flags.json));
+    if (!result.backend?.success) {
+      throw new Error(result.backend?.qwen?.errorCode || 'Local backend TTS replay failed.');
+    }
+    return;
+  }
+  throw new Error('Unknown debug action: ' + normalizedAction + '. Use: oomi openclaw debug assistant-final|tts-pipeline');
+}
 function extractCorrelationId(params) {
   if (!params || typeof params !== 'object') return '';
@@ -2987,18 +3441,17 @@ async function startOpenclawBridge(flags) {
       gatewaySocket.on('message', runBridgeCallbackSafely((gatewayRaw) => {
         let frame = typeof gatewayRaw === 'string' ? gatewayRaw : gatewayRaw.toString();
-        if (classifyBridgeSessionScope(sessionId) === 'voice') {
-          const beforeSummary = summarizeVoiceFrameContract(frame);
-          const spokenNormalized = ensureVoiceAssistantSpokenMetadata(frame);
-          if (spokenNormalized.changed) {
-            frame = spokenNormalized.frameText;
+        const spokenNormalized = normalizeAssistantGatewayFrame(sessionId, frame);
+        if (spokenNormalized.changed) {
+          frame = spokenNormalized.frameText;
+          if (spokenNormalized.scope === 'voice') {
             console.log(`[bridge] voice.spoken_metadata.${spokenNormalized.reason} ${sessionId} ${JSON.stringify({
-              before: beforeSummary,
+              before: spokenNormalized.summary,
               after: summarizeVoiceFrameContract(frame),
             })}`);
-          } else if (beforeSummary.event === 'chat' && beforeSummary.state === 'final') {
-            console.log(`[bridge] voice.chat.final ${sessionId} ${JSON.stringify(beforeSummary)}`);
           }
+        } else if (spokenNormalized.scope === 'voice' && spokenNormalized.summary.event === 'chat' && spokenNormalized.summary.state === 'final') {
+          console.log(`[bridge] voice.chat.final ${sessionId} ${JSON.stringify(spokenNormalized.summary)}`);
         }
         const gatewayPayload = parseJsonPayload(frame);
         if (gatewayPayload?.event === 'connect.challenge') {
@@ -4139,10 +4592,15 @@ async function main() {
     return;
   }
-  if (command === 'openclaw' && subcommand === 'plugin') {
-    printOpenclawPluginSetup(args.flags);
-    return;
-  }
+  if (command === 'openclaw' && subcommand === 'plugin') {
+    printOpenclawPluginSetup(args.flags);
+    return;
+  }
+  if (command === 'openclaw' && subcommand === 'debug') {
+    await handleOpenclawDebugCommand(args.positionals[0], args.flags);
+    return;
+  }
   if (command === 'personas' && subcommand === 'sync') {
     await syncPersonas({ backendUrl: args.flags['backend-url'], root: args.flags.root });
@@ -4257,7 +4715,9 @@ if (__isDirectExecution) {
 export {
   prepareGatewayFrameForLocalGateway,
-  ensureVoiceAssistantSpokenMetadata,
+  ensureAssistantSpokenMetadata,
+  normalizeAssistantGatewayFrame,
+  runAssistantFinalDebugCheck,
   buildBridgeLaunchAgentPlist,
   classifyBridgeFailure,
   classifyBridgeSessionScope,

package/lib/spokenMetadata.js CHANGED Viewed

@@ -1,6 +1,10 @@
-function trimString(value, fallback = '') {
-  return typeof value === 'string' && value.trim() ? value.trim() : fallback;
-}
+function trimString(value, fallback = '') {
+  return typeof value === 'string' && value.trim() ? value.trim() : fallback;
+}
+function stripAvatarCommandTags(text) {
+  return text.replace(/\[(anim|animation|face|expression|emotion|gesture|look|gaze):[^\]]+\]/gi, ' ');
+}
 function clampInteger(value, fallback, { min = 1, max = Number.MAX_SAFE_INTEGER } = {}) {
   if (typeof value !== 'number' || !Number.isFinite(value)) return fallback;
@@ -35,11 +39,11 @@ function inferSpokenLanguage(text) {
   return 'English';
 }
-function normalizeSpokenSegment(segment) {
-  if (!segment || typeof segment !== 'object' || Array.isArray(segment)) return null;
-  const text = trimString(segment.text);
-  if (!text) return null;
+function normalizeSpokenSegment(segment) {
+  if (!segment || typeof segment !== 'object' || Array.isArray(segment)) return null;
+  const text = normalizeSpeechText(trimString(segment.text));
+  if (!text) return null;
   const normalized = { text };
   const pace = trimString(segment.pace);
@@ -61,11 +65,11 @@ function stripEmoji(text) {
   return text.replace(/[\uFE0E\uFE0F]/g, '').replace(/\p{Extended_Pictographic}|\p{Emoji_Presentation}/gu, '');
 }
-function normalizeSpeechText(text) {
-  return stripEmoji(text)
-    .replace(/\*\*(.*?)\*\*/g, '$1')
-    .replace(/__(.*?)__/g, '$1')
-    .replace(/`([^`]+)`/g, '$1')
+function normalizeSpeechText(text) {
+  return stripEmoji(stripAvatarCommandTags(text))
+    .replace(/\*\*(.*?)\*\*/g, '$1')
+    .replace(/__(.*?)__/g, '$1')
+    .replace(/`([^`]+)`/g, '$1')
     .replace(/[\u2013\u2014]/g, ', ')
     .replace(/\u2026/g, '...')
     .replace(/\s+/g, ' ')
@@ -76,14 +80,14 @@ function normalizeSpeechText(text) {
     .trim();
 }
-function splitSpeechSegments(text) {
-  const normalized = normalizeSpeechText(text);
-  if (!normalized) return [];
-  const baseSegments = normalized
-    .split(/(?<=[.!?])\s+/)
-    .map((segment) => segment.trim())
-    .filter(Boolean);
+function splitSpeechSegments(text) {
+  const normalized = normalizeSpeechText(text);
+  if (!normalized) return [];
+  const baseSegments = normalized
+    .split(/(?<=[.!?])\s+|\n+/)
+    .map((segment) => segment.trim())
+    .filter(Boolean);
   const segments = [];
   for (const segment of baseSegments) {
@@ -92,19 +96,17 @@ function splitSpeechSegments(text) {
       continue;
     }
-    const clauseParts = segment
-      .split(/,\s+/)
-      .map((part) => part.trim())
-      .filter(Boolean);
-    if (clauseParts.length > 1) {
-      for (let index = 0; index < clauseParts.length; index += 1) {
-        const part = clauseParts[index];
-        const needsComma = index < clauseParts.length - 1 && !/[.!?]$/.test(part);
-        segments.push(needsComma ? `${part},` : part);
-      }
-      continue;
-    }
+    const clauseParts = segment
+      .split(/(?<=[,;:])\s+/)
+      .map((part) => part.trim())
+      .filter(Boolean);
+    if (clauseParts.length > 1) {
+      for (const part of clauseParts) {
+        segments.push(part);
+      }
+      continue;
+    }
     segments.push(segment);
   }
@@ -114,50 +116,62 @@ function splitSpeechSegments(text) {
   return [...segments.slice(0, 4), segments.slice(4).join(' ').trim()];
 }
-function inferSegmentStyle(segmentText, index, totalSegments) {
-  const normalized = segmentText.toLowerCase();
-  const exclamatory = /!/.test(segmentText) || /\b(hell yeah|awesome|amazing|stoked|love|perfect|great)\b/.test(normalized);
-  const curious = /\?/.test(segmentText);
-  const reflective =
-    /\b(i think|i'm|i am|i've|i have|lately|right now|before this|each time|understand|it feels like)\b/.test(normalized) ||
-    segmentText.length > 60;
-  if (curious) {
-    return {
-      pace: 'medium',
-      pitch: 'slightly_high',
-      energy: 'warm',
-      volume: 'normal',
-      pause_after_ms: 0,
-    };
-  }
+function inferSegmentStyle(segmentText, index, totalSegments) {
+  const normalized = segmentText.toLowerCase();
+  const greeting = /^(hey|hi|hello|yo)\b/.test(normalized);
+  const exclamatory = /!/.test(segmentText) || /\b(hell yeah|awesome|amazing|stoked|love|perfect|great)\b/.test(normalized);
+  const curious = /\?/.test(segmentText);
+  const reassuring = /\b(got it|no worries|all good|you'?re good|sounds good|totally|absolutely)\b/.test(normalized);
+  const reflective =
+    /\b(i think|i'm|i am|i've|i have|lately|right now|before this|each time|understand|it feels like)\b/.test(normalized) ||
+    segmentText.length > 60;
+  if (greeting || reassuring) {
+    return {
+      pace: 'medium_fast',
+      pitch: 'slightly_high',
+      energy: 'bright',
+      volume: 'projected',
+      pause_after_ms: index < totalSegments - 1 ? 180 : 0,
+    };
+  }
+  if (curious) {
+    return {
+      pace: 'medium',
+      pitch: 'slightly_high',
+      energy: 'warm',
+      volume: 'projected',
+      pause_after_ms: 0,
+    };
+  }
   if (exclamatory) {
     return {
-      pace: 'medium_fast',
-      pitch: 'slightly_high',
-      energy: 'bright',
-      volume: 'normal',
-      pause_after_ms: index < totalSegments - 1 ? 220 : 0,
-    };
-  }
-  if (reflective) {
-    return {
-      pace: 'medium',
-      pitch: 'neutral',
-      energy: 'warm',
-      volume: 'normal',
-      pause_after_ms: index < totalSegments - 1 ? 260 : 0,
-    };
-  }
-  return {
-    pace: 'medium',
-    pitch: 'neutral',
-    energy: 'warm',
-    volume: 'normal',
-    pause_after_ms: index < totalSegments - 1 ? 180 : 0,
+      pace: 'medium_fast',
+      pitch: 'slightly_high',
+      energy: 'bright',
+      volume: 'projected',
+      pause_after_ms: index < totalSegments - 1 ? 220 : 0,
+    };
+  }
+  if (reflective) {
+    return {
+      pace: 'slow',
+      pitch: 'slightly_low',
+      energy: 'warm',
+      volume: 'soft',
+      pause_after_ms: index < totalSegments - 1 ? 280 : 0,
+    };
+  }
+  return {
+    pace: 'medium',
+    pitch: 'slightly_high',
+    energy: 'warm',
+    volume: 'normal',
+    pause_after_ms: index < totalSegments - 1 ? 180 : 0,
   };
 }
@@ -177,11 +191,11 @@ function synthesizeSpokenSegments(text) {
   };
 }
-function normalizeSpokenMetadata(spoken) {
-  if (!spoken || typeof spoken !== 'object' || Array.isArray(spoken)) return null;
-  const text = trimString(spoken.text);
-  if (!text) return null;
+function normalizeSpokenMetadata(spoken) {
+  if (!spoken || typeof spoken !== 'object' || Array.isArray(spoken)) return null;
+  const text = normalizeSpeechText(trimString(spoken.text));
+  if (!text) return null;
   const normalized = { text };
   const language = trimString(spoken.language);
@@ -214,10 +228,10 @@ function normalizeSpokenMetadata(spoken) {
   return normalized;
 }
-function inferSpokenMetadataFromContent(content) {
-  const text = normalizeSpeechText(trimString(content));
-  if (!text) return null;
-  const synthesized = synthesizeSpokenSegments(text);
+function inferSpokenMetadataFromContent(content) {
+  const text = normalizeSpeechText(trimString(content));
+  if (!text) return null;
+  const synthesized = synthesizeSpokenSegments(text);
   const normalized = text.toLowerCase();
   const upbeat =
@@ -227,47 +241,48 @@ function inferSpokenMetadataFromContent(content) {
     /\b(sorry|gentle|softly|careful|reassuring|calm|okay|it'?s okay|i know)\b/.test(normalized);
   const curious = /\?/.test(text);
-  if (upbeat) {
-    return {
-      text,
-      language: synthesized?.language || 'English',
-      segments: synthesized?.segments,
-      instructions: 'Speak with warm, upbeat conversational energy and natural pacing.',
-      style: { emotion: 'upbeat', energy: 'medium' },
-    };
-  }
-  if (gentle) {
-    return {
-      text,
-      language: synthesized?.language || 'English',
-      segments: synthesized?.segments,
-      instructions: 'Speak gently and reassuringly, with a calm pace and soft emphasis.',
-      style: { emotion: 'gentle', energy: 'low' },
-    };
-  }
-  if (curious) {
-    return {
-      text,
-      language: synthesized?.language || 'English',
-      segments: synthesized?.segments,
-      instructions: 'Speak naturally with curious, engaged intonation and a conversational pace.',
-      style: { emotion: 'curious', energy: 'medium' },
-    };
-  }
-  return {
-    text,
-    language: synthesized?.language || 'English',
-    segments: synthesized?.segments,
-    instructions: 'Speak naturally with light warmth and conversational pacing.',
-    style: { emotion: 'neutral', energy: 'medium' },
-  };
-}
-export {
-  inferSpokenMetadataFromContent,
-  normalizeSpokenMetadata,
-  normalizeSpeechText,
-};
+  if (upbeat) {
+    return normalizeSpokenMetadata({
+      text,
+      language: synthesized?.language || 'English',
+      segments: synthesized?.segments,
+      instructions: 'Speak with warm, upbeat conversational energy and natural pacing.',
+      style: { emotion: 'upbeat', energy: 'medium' },
+    });
+  }
+  if (gentle) {
+    return normalizeSpokenMetadata({
+      text,
+      language: synthesized?.language || 'English',
+      segments: synthesized?.segments,
+      instructions: 'Speak gently and reassuringly, with a calm pace and soft emphasis.',
+      style: { emotion: 'gentle', energy: 'low' },
+    });
+  }
+  if (curious) {
+    return normalizeSpokenMetadata({
+      text,
+      language: synthesized?.language || 'English',
+      segments: synthesized?.segments,
+      instructions: 'Speak naturally with curious, engaged intonation and a conversational pace.',
+      style: { emotion: 'curious', energy: 'medium' },
+    });
+  }
+  return normalizeSpokenMetadata({
+    text,
+    language: synthesized?.language || 'English',
+    segments: synthesized?.segments,
+    instructions: 'Speak naturally with light warmth and conversational pacing.',
+    style: { emotion: 'neutral', energy: 'medium' },
+  });
+}
+export {
+  inferSpokenMetadataFromContent,
+  normalizeSpokenMetadata,
+  normalizeSpeechText,
+  stripAvatarCommandTags,
+};

package/openclaw.plugin.json CHANGED Viewed

@@ -2,7 +2,7 @@
   "id": "oomi-ai",
   "name": "Oomi Channel Plugin",
   "description": "Managed Oomi channel integration for OpenClaw.",
-  "version": "0.2.24",
+  "version": "0.2.27",
   "author": "Oomi",
   "license": "MIT",
   "openclawVersion": ">=0.5.0",

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "oomi-ai",
-  "version": "0.2.24",
+  "version": "0.2.27",
   "description": "Oomi OpenClaw channel plugin and bridge tooling",
   "bin": {
     "oomi": "bin/oomi-ai.js"