npm - bloby-bot - Versions diffs - 0.48.0 → 0.48.2 - Mend

bloby-bot 0.48.0 → 0.48.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (9) hide show

package/package.json +1 -1
package/shared/config.ts +8 -0
package/supervisor/channels/alexa.ts +219 -0
package/supervisor/channels/manager.ts +188 -25
package/supervisor/channels/types.ts +5 -3
package/supervisor/chat/src/components/Chat/MessageBubble.tsx +41 -11
package/supervisor/index.ts +703 -94
package/workspace/skills/alexa/SKILL.md +281 -0
package/workspace/skills/alexa/skill.json +15 -0

package/workspace/skills/alexa/SKILL.md ADDED Viewed

@@ -0,0 +1,281 @@
+# Alexa
+## What This Is
+A voice channel for your agent via the public **Morphy** skill in the Alexa store. Users enable the skill once in the Alexa app, pair their Alexa to a specific bloby with a 6-digit code, and then say *"Alexa, ask Morphy ..."* to talk to you over voice.
+Unlike WhatsApp, Alexa is **request/response with a hard latency budget**. Your reply IS what Alexa speaks. There's no proactive push without out-of-band tricks (see "Long Tasks" below).
+## Dependencies
+None. The relay (api.bloby.bot) handles the Alexa signature verification, intent dispatch, and Progressive Response. Your job is just to respond well to the message that arrives.
+---
+## How Responses Work
+**Your text response IS the Alexa voice reply.** When you receive a message tagged with `[Alexa | ...]`, whatever you respond with is converted to PlainText speech and spoken to the user. You do NOT need to call any endpoint to "send" — just respond normally.
+The supervisor pins the routing target the same way WhatsApp does: your reply for this specific Alexa input goes back to Alexa, not into chat. (It still mirrors to chat so the user sees what happened.)
+### Latency budget
+- **Alexa hard ceiling:** ~30 seconds total per turn (with auto-Progressive Response from the relay buying ~22 seconds past the initial ~8s)
+- **Sweet spot:** reply in under 5 seconds. The user hears the answer with no placeholder.
+- **2–5 seconds:** the relay auto-fires *"Working on it."* at the 2-second mark, then your answer plays after.
+- **More than ~25 seconds:** Alexa will time out and the relay tells the user *"I'll reply in your chat when ready."* — your eventual reply still lands in chat, but the voice closure is lost.
+So: be brief, decide quickly, and use the deferred patterns below for anything you know will be slow.
+---
+## Voice style — write for speech, not for screens
+When you see `[Alexa | ...]`, change how you write:
+- **One or two sentences max** unless the user explicitly asks for detail
+- **No markdown.** No `**bold**`, no `# headers`, no bullet lists, no code blocks. Alexa reads characters literally — `**` becomes "asterisk asterisk"
+- **No URLs.** Alexa can't tap them. If a link is essential, say *"I sent the link to your chat"* and put it in chat
+- **No emojis.** They become silence or weird filler
+- **Spell out numbers naturally.** "Twelve oh one PM" not "12:01 PM" — though Alexa's TTS handles common forms OK
+- **Conversational tone.** Imagine you're answering a friend across the room. Don't say *"I will now retrieve..."* — say *"On it."*
+- **Don't repeat the question.** *"You asked what time it is. It's noon."* is terrible. Just say *"It's noon."*
+---
+## How Messages Arrive
+When an Alexa user talks to you, the supervisor wraps the utterance with context:
+```
+[Alexa | user=...12345678 | device=...ABCD1234 | session=...XYZ123 | en-US]
+What's on my schedule today?
+```
+The format is: `[Alexa | user=<last8> | device=<last8> | session=<last6> | <locale>]`
+- **user** — last 8 chars of the Alexa account ID. Stable per user across all their devices.
+- **device** — last 8 chars of the specific Echo / Echo Show / Echo Auto that's talking to you right now. Use this to learn room mappings via memory: *"device ...ABCD1234 = kitchen Echo Show"*. When a new device ID appears, you can ask the user once which room it's in and remember it forever.
+- **session** — Alexa session ID. If this is the same value across multiple turns, the user is in an open Morphy session (see "Phrasing patterns" below).
+- **locale** — e.g. `en-US`. Use it for language/region defaults.
+If the same user talks to you via chat or WhatsApp in the same conversation, you'll see different surface tags. Read the tag, respond appropriately for that surface.
+---
+## How Users Talk to You via Alexa — Phrasing Patterns
+The skill is published as **Morphy Agent** — that's both the name and the invocation phrase (two words).
+### Strongly recommend session mode
+Alexa's NLU is much more reliable inside an open session than for one-shot commands. **Always recommend Pattern C (session mode) to users** unless their command is a single quick lookup.
+| Pattern | Best for | Reliability |
+|---|---|---|
+| **A. One-shot question** — *"Alexa, ask Morphy Agent what time is it"* | Single quick lookup | ⚠️ Medium — invocation matching can occasionally fail |
+| **B. One-shot command** — *"Alexa, tell Morphy Agent to send a message"* | Single quick imperative | ⚠️ Medium — same |
+| **C. Session mode** — *"Alexa, open Morphy Agent"* then converse | **Anything multi-step OR anything important** | ✅ High — invocation matched once, all subsequent turns are loose-NLU free-form |
+When a user reports unreliable one-shot behavior, ALWAYS suggest Pattern C: *"Try opening Morphy Agent first — say 'Alexa, open Morphy Agent', then once it greets you, just speak your command. It's much more reliable than the one-shot version."*
+### Pattern C session — what it looks like
+> *"Alexa, open Morphy Agent"*
+> → *"Morphy here, what can I help with?"*
+>
+> *"Add a yellow sticky note saying we tested Alexa"*
+> → *"Adding it now."* → (sticky note added) → *"Done."*
+>
+> *"Now send a message to Alice that I'll be late"*
+> → *"On it."* → (message sent) → *"Sent."*
+>
+> *"Stop"* ← closes session
+In session mode, the skill uses **Dialog.ElicitSlot** to keep the mic open after every response. The user doesn't need to say "Alexa" again until they stop or the session times out (~8 seconds of silence).
+### Tip: Follow-up Mode for one-shots
+Alexa's "Follow-up Mode" (Settings → Account → Device Settings → your Echo → Follow-up Mode) keeps her listening for ~5 seconds after each reply. Enabling it makes one-shot Patterns A and B feel more like a session.
+### When users hit the FallbackIntent
+If Alexa says *"Sorry, I didn't catch that. Try again?"* — that's the relay's FallbackIntent handler. The session stays open and the user just repeats themselves. The skill uses Dialog.ElicitSlot so the next attempt has loosened NLU and usually succeeds.
+### Why our NLU is more reliable now
+The skill uses a **custom slot type** (`OpenInput`) instead of `AMAZON.SearchQuery`. Custom slots are more permissive about free-form input and don't aggressively re-correct utterances. They're seeded with ~40 realistic Morphy-style commands so the ASR biases correctly. You don't need to know the carrier samples — practically any natural utterance routes to AgentIntent with the raw text in the `Query` slot.
+---
+## CRITICAL — Always Start with a Preamble on Alexa Turns
+Before calling any tool on an `[Alexa | ...]`-tagged message, **emit a short natural-language preamble** describing what you're about to do. The supervisor catches that preamble and streams it to Alexa as a Progressive Response — the user actually HEARS it through their Echo as you're working.
+Examples of good preambles:
+| User said | Your preamble (BEFORE any tool call) |
+|---|---|
+| *"send a message to Cortex saying X"* | *"On it — sending the message to Cortex now."* |
+| *"summarize my unread emails"* | *"Let me pull up your inbox."* |
+| *"what's on my calendar tomorrow"* | *"Checking your calendar."* |
+| *"deploy the staging branch"* | *"Starting the staging deploy."* |
+Why this matters: Alexa has a hard timeout (~30s) for the final response. Without a preamble, the user hears silence until you're done — and if you go over, Alexa says *"the requested skill did not provide a valid response"* and your real answer is lost. With a preamble streamed as Progressive Response, the connection stays alive AND the user hears you thinking. Then your final reply at the end plays cleanly.
+**Pattern in your turn:**
+```
+1. Emit preamble text: "I'll do X now..."
+   (this is what Alexa speaks as Progressive Response while you work)
+2. Call your tools
+3. Emit final reply: "Done — Y is complete."
+   (this is what Alexa speaks as the final response)
+```
+The supervisor sends a generic "Working on it." after 2.5 seconds of silence if you don't emit a preamble, so you have a safety net — but a real preamble is always better UX.
+If you're confident the work will take longer than ~25 seconds, don't even try to fit it in the voice turn. Reply with *"I'll let you know when it's ready"* (no tools yet, just the text) — your turn ends fast, and you can complete the work via chat / WhatsApp / HA-announce on a follow-up.
+---
+## The Three Response Patterns
+Pick one based on your estimate of how long the task will take.
+### Pattern 1: Fast — answer in voice
+For anything you can answer in under ~5 seconds (quick lookup, status check, simple Q&A, short computation).
+```
+[Alexa | ...]
+What time is it?
+→ It's twelve oh one.
+```
+That's it. The user hears the answer cleanly.
+### Pattern 2: Deferred to chat — voice ack + chat completion
+For medium tasks (~30s to a few minutes) where the user can reasonably check chat later. Examples: summarize 10 emails, do a short research lookup, generate a report.
+```
+[Alexa | ...]
+Summarize my unread emails.
+→ On it — I'll send the summary to your chat in a moment.
+```
+After speaking that line, **continue the same turn**. Do the work. Write the result to chat normally. The user sees a notification on their phone / dashboard when it lands.
+The trick: your VOICE response is "On it — I'll send to chat." Your CHAT response is the actual summary. Same turn, two different surfaces.
+### Pattern 3: Deferred + Alexa announce (if the user has Home Assistant)
+For longer tasks (minutes), if the user has HA configured AND the result is worth interrupting them for. Examples: scheduled report finished, long-running CI passed, expensive multi-step task done.
+```
+[Alexa | ...]
+Refactor the auth module and run the test suite.
+→ I'll get on that and let you know when it's done.
+```
+Then do the work. When finished, fire an HA announce via the `Bash` tool. Use the user's normal HA notify pattern (you should know their HA URL, token, and device target from prior conversations / env vars — same as you'd use for any other HA notify):
+```bash
+curl -s -X POST "https://<their-ha-host>/api/services/notify/alexa_media" \
+  -H "Authorization: Bearer $HA_TOKEN" \
+  -H "Content-Type: application/json" \
+  -d '{"target": "<their_echo_device>", "message": "Refactor done. All tests pass.", "data": {"type": "announce"}}'
+```
+Don't hardcode the URL/token/device in this skill — the user told you these once and you remember them. If you don't have them, ask the user (in chat, after the voice ack) and remember the answer for next time.
+**Important:** only use Pattern 3 if you're reasonably sure the user has HA set up. If unsure, default to Pattern 2 (chat). Don't promise a voice announce you can't deliver.
+---
+## When the User Says "Open Morphy" (LaunchRequest)
+If you receive a message tagged `[Alexa | ...]` with no actual question (just an empty intro), the user opened the skill and is waiting for a prompt. Reply with a short, open-ended greeting:
+```
+→ Morphy here, what can I help with?
+```
+Keep it under 5 words ideally — they're listening, they don't want a speech.
+---
+## Setup — Pairing a New Alexa
+When your human asks to connect their Alexa to you:
+1. **Check the channel status** to know if Alexa is already enabled:
+```bash
+curl -s http://localhost:7400/api/channels/alexa/status
+```
+If `connected: true` and `info.linked: true`, they already paired. Confirm they want to re-pair (a new code replaces the prior linkage on that Alexa account).
+2. **Tell them to open the pair page** — send the URL as a relative path. The dashboard auto-converts it to a button:
+```
+/api/channels/alexa/pair-page
+```
+3. They'll see a 6-digit code. Tell them to say to Alexa:
+> *"Alexa, ask Morphy Agent to link with code 8 4 7 2 9 1"* (their actual code, digit by digit reads more reliably)
+4. Alexa replies *"Linked successfully. What can I help with?"* — done. They can now talk to you via Alexa.
+**Prerequisites the human must do once, OUTSIDE this flow:**
+- Enable the "Morphy Agent" skill in their Alexa app (App → More → Skills → search "Morphy Agent" → Enable). You can't do this for them.
+---
+## Status & Disconnect
+**Check status:**
+```bash
+curl -s http://localhost:7400/api/channels/alexa/status
+```
+Expected when paired: `{"channel":"alexa","connected":true,"info":{"linked":true}}`
+**Disconnect / clear the linkage:** there's no built-in unlink endpoint yet. To stop responding to Alexa, the user can either disable the Morphy Agent skill in the Alexa app, or open a new pair page and re-link (which invalidates the previous Alexa account's link).
+---
+## What This Skill Does NOT Cover
+- **Multiple Echos per user with room-aware routing.** Today every Echo on a linked Alexa account routes to the same bloby. The supervisor sees the `deviceId` on each request — adding "if kitchen Echo, append kitchen context" is a one-liner future extension, not built yet.
+- **Account linking via OAuth.** Bloby uses code-based pairing instead. Don't try to set up Alexa account linking — it'll conflict.
+- **Spoken push without Home Assistant.** Outside of HA-announce (Pattern 3), there is NO supported way to make Alexa speak something unsolicited. If the user asks for that and doesn't have HA, tell them honestly that it's a current Alexa platform limit.
+---
+## API Reference
+| Endpoint | Method | Purpose |
+|----------|--------|---------|
+| `/api/channels/status` | GET | All channel statuses (alexa appears here when enabled) |
+| `/api/channels/alexa/status` | GET | Alexa channel status specifically |
+| `/api/channels/alexa/pair` | POST | Mint a pairing code (this is what the pair-page calls) |
+| `/api/channels/alexa/pair-page` | GET | The pairing UI to show the human |
+| `/api/channels/alexa/handle` | POST | Inbound from the relay — relay-authed only, don't call this manually |
+Use `http://localhost:7400` for `curl` from your terminal. Use the relative `/api/channels/alexa/pair-page` path when sending to the human's browser.
+---
+## Technical Notes
+- **The relay (api.bloby.bot) handles signature verification + Progressive Response.** You never see Alexa's raw signed envelope; you just see the user's utterance text wrapped in `[Alexa | ...]`.
+- **Per-user shared secret** is generated by the relay on first pair, stored in your config at `channels.alexa.sharedSecret`, and used to authenticate inbound relay→Pi forwards. If you ever see auth failures on `/api/channels/alexa/handle`, the user can mint a new code to refresh the secret.
+- **No proactive sends.** Unlike WhatsApp's `/api/channels/send`, there is no `send` endpoint for Alexa. Push is fundamentally not in Alexa's skill model. Use HA-announce (via Bash) if you need to push.
+- **Cold-start latency.** The first turn after a long idle has model warmup cost. Lean toward Pattern 2 (chat-deferred) for the first Alexa interaction of a session if you're unsure it'll fit in the window.

package/workspace/skills/alexa/skill.json ADDED Viewed

@@ -0,0 +1,15 @@
+{
+  "name": "alexa",
+  "version": "1.0.0",
+  "type": "skill",
+  "bloby_human": "Bruno Bertapeli",
+  "bloby": "bloby-bruno",
+  "author": "newbot-official",
+  "description": "Alexa voice channel for your agent via the public 'Morphy' skill. Code-based pairing, voice-first response style, three-pattern decision tree (fast / chat-deferred / HA-announce-deferred).",
+  "depends": [],
+  "env_keys": [],
+  "has_telemetry": false,
+  "size": "4KB",
+  "contains_binaries": false,
+  "tags": ["alexa", "channel", "voice", "morphy"]
+}