npm - oomi-ai - Versions diffs - 0.2.14 → 0.2.15 - Mend

oomi-ai 0.2.14 → 0.2.15

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (6) hide show

package/agent_instructions.md +30 -0
package/openclaw.extension.js +41 -2
package/openclaw.plugin.json +1 -1
package/package.json +1 -1
package/skills/oomi/SKILL.md +29 -0
package/skills/oomi/agent_instructions.md +30 -0

package/agent_instructions.md CHANGED Viewed

@@ -138,6 +138,36 @@ That means:
 - STT can succeed even when assistant reply delivery is broken later in the run
 - a `voice_session_*` failure should be investigated, but it should not automatically be treated as proof that all normal Oomi chat is down
+### Hidden Speech Payload
+For managed voice turns, keep visible assistant chat text natural and user-facing.
+Do not put spoken-style tags like `[happy]`, `[sad]`, or `[excited]` into visible chat text.
+When the runtime supports it, voice turns may include a hidden speech sidecar on the assistant message:
+```json
+{
+  "metadata": {
+    "spoken": {
+      "text": "Speech-optimized text for TTS only.",
+      "instructions": "Speak with upbeat, warm excitement and slightly rising intonation.",
+      "style": {
+        "emotion": "excited",
+        "energy": "medium_high"
+      }
+    }
+  }
+}
+```
+Rules:
+- visible `content` remains the source of truth for Oomi chat rendering
+- for managed voice replies, include `metadata.spoken` when delivery benefits from cleaner phrasing or explicit speaking guidance
+- `metadata.spoken.text` is for backend TTS only
+- `metadata.spoken.instructions` should be natural-language guidance, not raw bracket tags
+- `metadata.spoken.style` is optional metadata for debugging/future mapping
+- if no hidden speech sidecar exists, Oomi falls back to speaking the visible assistant text
 ## Avatar Commands
 Before using avatar commands, call `get_avatar_capabilities` and prefer canonical values.

package/openclaw.extension.js CHANGED Viewed

@@ -178,6 +178,45 @@ function extractCorrelationId(payload) {
   return '';
 }
+function normalizeSpokenMetadata(spoken) {
+  if (!spoken || typeof spoken !== 'object' || Array.isArray(spoken)) return null;
+  const text = toString(spoken.text);
+  if (!text) return null;
+  const normalized = { text };
+  const instructions = toString(spoken.instructions);
+  if (instructions) normalized.instructions = instructions;
+  if (spoken.style && typeof spoken.style === 'object' && !Array.isArray(spoken.style)) {
+    normalized.style = spoken.style;
+  }
+  return normalized;
+}
+function normalizeOutgoingMetadata(payloadMetadata, { accountId, correlationId }) {
+  const metadata =
+    payloadMetadata && typeof payloadMetadata === 'object' && !Array.isArray(payloadMetadata)
+      ? { ...payloadMetadata }
+      : {};
+  const spoken = normalizeSpokenMetadata(metadata.spoken);
+  if (spoken) {
+    metadata.spoken = spoken;
+  } else {
+    delete metadata.spoken;
+  }
+  metadata.accountId = accountId;
+  if (correlationId) {
+    metadata.correlationId = correlationId;
+  } else {
+    delete metadata.correlationId;
+  }
+  return metadata;
+}
 async function postJson({ url, token, body, timeoutMs }) {
   const controller = new AbortController();
   const timeout = setTimeout(() => controller.abort(), timeoutMs);
@@ -289,10 +328,10 @@ const oomiChannelPlugin = {
           sessionKey,
           content,
           source: 'openclaw.channel',
-          metadata: {
+          metadata: normalizeOutgoingMetadata(payload?.metadata, {
             accountId: resolvedAccountId,
             correlationId,
-          },
+          }),
         },
       });

package/openclaw.plugin.json CHANGED Viewed

@@ -2,7 +2,7 @@
   "id": "oomi-ai",
   "name": "Oomi Channel Plugin",
   "description": "Managed Oomi channel integration for OpenClaw.",
-  "version": "0.2.14",
+  "version": "0.2.15",
   "author": "Oomi",
   "license": "MIT",
   "openclawVersion": ">=0.5.0",

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "oomi-ai",
-  "version": "0.2.14",
+  "version": "0.2.15",
   "description": "Oomi OpenClaw channel plugin and bridge tooling",
   "bin": {
     "oomi": "bin/oomi-ai.js"

package/skills/oomi/SKILL.md CHANGED Viewed

@@ -128,6 +128,35 @@ Install packaged Oomi operator instructions into an OpenClaw `AGENTS.md` file.
 python3 skills/oomi/scripts/install_agent_instructions.py
 ```
+## Hidden Speech Payload
+Managed voice can carry a hidden TTS-only speech sidecar alongside the normal assistant message.
+Use this shape when a voice turn needs more natural delivery without changing visible chat text:
+```json
+{
+  "metadata": {
+    "spoken": {
+      "text": "Speech-optimized text for TTS only.",
+      "instructions": "Speak with upbeat, warm excitement and slightly rising intonation.",
+      "style": {
+        "emotion": "excited",
+        "energy": "medium_high"
+      }
+    }
+  }
+}
+```
+Rules:
+- keep visible assistant `content` clean and user-facing
+- do not place raw intonation tags in visible chat
+- for managed voice replies, include `metadata.spoken` when delivery benefits from cleaner phrasing or explicit speaking guidance
+- `metadata.spoken.text` is backend TTS input only
+- `metadata.spoken.instructions` should use natural-language speaking guidance
+- if the speech sidecar is absent, Oomi speaks the visible assistant text
 ## Avatar Control
 Before emitting avatar commands, call `get_avatar_capabilities` and prefer canonical values.

package/skills/oomi/agent_instructions.md CHANGED Viewed

@@ -44,3 +44,33 @@ Primary UX requirement:
 - Return `/connect/<invite-token>` auth links only.
 Do not ask users to paste gateway IP/token/password when managed connect is available.
+## Hidden Speech Payload
+For managed voice turns, keep visible assistant chat text natural and user-facing.
+Do not put spoken-style tags like `[happy]`, `[sad]`, or `[excited]` into visible chat text.
+When the runtime supports it, voice turns may include a hidden speech sidecar on the assistant message:
+```json
+{
+  "metadata": {
+    "spoken": {
+      "text": "Speech-optimized text for TTS only.",
+      "instructions": "Speak with upbeat, warm excitement and slightly rising intonation.",
+      "style": {
+        "emotion": "excited",
+        "energy": "medium_high"
+      }
+    }
+  }
+}
+```
+Rules:
+- visible `content` remains the source of truth for Oomi chat rendering
+- for managed voice replies, include `metadata.spoken` when delivery benefits from cleaner phrasing or explicit speaking guidance
+- `metadata.spoken.text` is for backend TTS only
+- `metadata.spoken.instructions` should be natural-language guidance, not raw bracket tags
+- `metadata.spoken.style` is optional metadata for debugging or future mapping
+- if no hidden speech sidecar exists, Oomi falls back to speaking the visible assistant text