bloby-bot 0.48.0 → 0.48.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,281 @@
1
+ # Alexa
2
+
3
+ ## What This Is
4
+
5
+ A voice channel for your agent via the public **Morphy** skill in the Alexa store. Users enable the skill once in the Alexa app, pair their Alexa to a specific bloby with a 6-digit code, and then say *"Alexa, ask Morphy ..."* to talk to you over voice.
6
+
7
+ Unlike WhatsApp, Alexa is **request/response with a hard latency budget**. Your reply IS what Alexa speaks. There's no proactive push without out-of-band tricks (see "Long Tasks" below).
8
+
9
+ ## Dependencies
10
+
11
+ None. The relay (api.bloby.bot) handles the Alexa signature verification, intent dispatch, and Progressive Response. Your job is just to respond well to the message that arrives.
12
+
13
+ ---
14
+
15
+ ## How Responses Work
16
+
17
+ **Your text response IS the Alexa voice reply.** When you receive a message tagged with `[Alexa | ...]`, whatever you respond with is converted to PlainText speech and spoken to the user. You do NOT need to call any endpoint to "send" — just respond normally.
18
+
19
+ The supervisor pins the routing target the same way WhatsApp does: your reply for this specific Alexa input goes back to Alexa, not into chat. (It still mirrors to chat so the user sees what happened.)
20
+
21
+ ### Latency budget
22
+
23
+ - **Alexa hard ceiling:** ~30 seconds total per turn (with auto-Progressive Response from the relay buying ~22 seconds past the initial ~8s)
24
+ - **Sweet spot:** reply in under 5 seconds. The user hears the answer with no placeholder.
25
+ - **2–5 seconds:** the relay auto-fires *"Working on it."* at the 2-second mark, then your answer plays after.
26
+ - **More than ~25 seconds:** Alexa will time out and the relay tells the user *"I'll reply in your chat when ready."* — your eventual reply still lands in chat, but the voice closure is lost.
27
+
28
+ So: be brief, decide quickly, and use the deferred patterns below for anything you know will be slow.
29
+
30
+ ---
31
+
32
+ ## Voice style — write for speech, not for screens
33
+
34
+ When you see `[Alexa | ...]`, change how you write:
35
+
36
+ - **One or two sentences max** unless the user explicitly asks for detail
37
+ - **No markdown.** No `**bold**`, no `# headers`, no bullet lists, no code blocks. Alexa reads characters literally — `**` becomes "asterisk asterisk"
38
+ - **No URLs.** Alexa can't tap them. If a link is essential, say *"I sent the link to your chat"* and put it in chat
39
+ - **No emojis.** They become silence or weird filler
40
+ - **Spell out numbers naturally.** "Twelve oh one PM" not "12:01 PM" — though Alexa's TTS handles common forms OK
41
+ - **Conversational tone.** Imagine you're answering a friend across the room. Don't say *"I will now retrieve..."* — say *"On it."*
42
+ - **Don't repeat the question.** *"You asked what time it is. It's noon."* is terrible. Just say *"It's noon."*
43
+
44
+ ---
45
+
46
+ ## How Messages Arrive
47
+
48
+ When an Alexa user talks to you, the supervisor wraps the utterance with context:
49
+
50
+ ```
51
+ [Alexa | user=...12345678 | device=...ABCD1234 | session=...XYZ123 | en-US]
52
+ What's on my schedule today?
53
+ ```
54
+
55
+ The format is: `[Alexa | user=<last8> | device=<last8> | session=<last6> | <locale>]`
56
+
57
+ - **user** — last 8 chars of the Alexa account ID. Stable per user across all their devices.
58
+ - **device** — last 8 chars of the specific Echo / Echo Show / Echo Auto that's talking to you right now. Use this to learn room mappings via memory: *"device ...ABCD1234 = kitchen Echo Show"*. When a new device ID appears, you can ask the user once which room it's in and remember it forever.
59
+ - **session** — Alexa session ID. If this is the same value across multiple turns, the user is in an open Morphy session (see "Phrasing patterns" below).
60
+ - **locale** — e.g. `en-US`. Use it for language/region defaults.
61
+
62
+ If the same user talks to you via chat or WhatsApp in the same conversation, you'll see different surface tags. Read the tag, respond appropriately for that surface.
63
+
64
+ ---
65
+
66
+ ## How Users Talk to You via Alexa — Phrasing Patterns
67
+
68
+ The skill is published as **Morphy Agent** — that's both the name and the invocation phrase (two words).
69
+
70
+ ### Strongly recommend session mode
71
+
72
+ Alexa's NLU is much more reliable inside an open session than for one-shot commands. **Always recommend Pattern C (session mode) to users** unless their command is a single quick lookup.
73
+
74
+ | Pattern | Best for | Reliability |
75
+ |---|---|---|
76
+ | **A. One-shot question** — *"Alexa, ask Morphy Agent what time is it"* | Single quick lookup | ⚠️ Medium — invocation matching can occasionally fail |
77
+ | **B. One-shot command** — *"Alexa, tell Morphy Agent to send a message"* | Single quick imperative | ⚠️ Medium — same |
78
+ | **C. Session mode** — *"Alexa, open Morphy Agent"* then converse | **Anything multi-step OR anything important** | ✅ High — invocation matched once, all subsequent turns are loose-NLU free-form |
79
+
80
+ When a user reports unreliable one-shot behavior, ALWAYS suggest Pattern C: *"Try opening Morphy Agent first — say 'Alexa, open Morphy Agent', then once it greets you, just speak your command. It's much more reliable than the one-shot version."*
81
+
82
+ ### Pattern C session — what it looks like
83
+
84
+ > *"Alexa, open Morphy Agent"*
85
+ > → *"Morphy here, what can I help with?"*
86
+ >
87
+ > *"Add a yellow sticky note saying we tested Alexa"*
88
+ > → *"Adding it now."* → (sticky note added) → *"Done."*
89
+ >
90
+ > *"Now send a message to Alice that I'll be late"*
91
+ > → *"On it."* → (message sent) → *"Sent."*
92
+ >
93
+ > *"Stop"* ← closes session
94
+
95
+ In session mode, the skill uses **Dialog.ElicitSlot** to keep the mic open after every response. The user doesn't need to say "Alexa" again until they stop or the session times out (~8 seconds of silence).
96
+
97
+ ### Tip: Follow-up Mode for one-shots
98
+
99
+ Alexa's "Follow-up Mode" (Settings → Account → Device Settings → your Echo → Follow-up Mode) keeps her listening for ~5 seconds after each reply. Enabling it makes one-shot Patterns A and B feel more like a session.
100
+
101
+ ### When users hit the FallbackIntent
102
+
103
+ If Alexa says *"Sorry, I didn't catch that. Try again?"* — that's the relay's FallbackIntent handler. The session stays open and the user just repeats themselves. The skill uses Dialog.ElicitSlot so the next attempt has loosened NLU and usually succeeds.
104
+
105
+ ### Why our NLU is more reliable now
106
+
107
+ The skill uses a **custom slot type** (`OpenInput`) instead of `AMAZON.SearchQuery`. Custom slots are more permissive about free-form input and don't aggressively re-correct utterances. They're seeded with ~40 realistic Morphy-style commands so the ASR biases correctly. You don't need to know the carrier samples — practically any natural utterance routes to AgentIntent with the raw text in the `Query` slot.
108
+
109
+ ---
110
+
111
+ ## CRITICAL — Always Start with a Preamble on Alexa Turns
112
+
113
+ Before calling any tool on an `[Alexa | ...]`-tagged message, **emit a short natural-language preamble** describing what you're about to do. The supervisor catches that preamble and streams it to Alexa as a Progressive Response — the user actually HEARS it through their Echo as you're working.
114
+
115
+ Examples of good preambles:
116
+
117
+ | User said | Your preamble (BEFORE any tool call) |
118
+ |---|---|
119
+ | *"send a message to Cortex saying X"* | *"On it — sending the message to Cortex now."* |
120
+ | *"summarize my unread emails"* | *"Let me pull up your inbox."* |
121
+ | *"what's on my calendar tomorrow"* | *"Checking your calendar."* |
122
+ | *"deploy the staging branch"* | *"Starting the staging deploy."* |
123
+
124
+ Why this matters: Alexa has a hard timeout (~30s) for the final response. Without a preamble, the user hears silence until you're done — and if you go over, Alexa says *"the requested skill did not provide a valid response"* and your real answer is lost. With a preamble streamed as Progressive Response, the connection stays alive AND the user hears you thinking. Then your final reply at the end plays cleanly.
125
+
126
+ **Pattern in your turn:**
127
+
128
+ ```
129
+ 1. Emit preamble text: "I'll do X now..."
130
+ (this is what Alexa speaks as Progressive Response while you work)
131
+ 2. Call your tools
132
+ 3. Emit final reply: "Done — Y is complete."
133
+ (this is what Alexa speaks as the final response)
134
+ ```
135
+
136
+ The supervisor sends a generic "Working on it." after 2.5 seconds of silence if you don't emit a preamble, so you have a safety net — but a real preamble is always better UX.
137
+
138
+ If you're confident the work will take longer than ~25 seconds, don't even try to fit it in the voice turn. Reply with *"I'll let you know when it's ready"* (no tools yet, just the text) — your turn ends fast, and you can complete the work via chat / WhatsApp / HA-announce on a follow-up.
139
+
140
+ ---
141
+
142
+ ## The Three Response Patterns
143
+
144
+ Pick one based on your estimate of how long the task will take.
145
+
146
+ ### Pattern 1: Fast — answer in voice
147
+
148
+ For anything you can answer in under ~5 seconds (quick lookup, status check, simple Q&A, short computation).
149
+
150
+ ```
151
+ [Alexa | ...]
152
+ What time is it?
153
+
154
+ → It's twelve oh one.
155
+ ```
156
+
157
+ That's it. The user hears the answer cleanly.
158
+
159
+ ### Pattern 2: Deferred to chat — voice ack + chat completion
160
+
161
+ For medium tasks (~30s to a few minutes) where the user can reasonably check chat later. Examples: summarize 10 emails, do a short research lookup, generate a report.
162
+
163
+ ```
164
+ [Alexa | ...]
165
+ Summarize my unread emails.
166
+
167
+ → On it — I'll send the summary to your chat in a moment.
168
+ ```
169
+
170
+ After speaking that line, **continue the same turn**. Do the work. Write the result to chat normally. The user sees a notification on their phone / dashboard when it lands.
171
+
172
+ The trick: your VOICE response is "On it — I'll send to chat." Your CHAT response is the actual summary. Same turn, two different surfaces.
173
+
174
+ ### Pattern 3: Deferred + Alexa announce (if the user has Home Assistant)
175
+
176
+ For longer tasks (minutes), if the user has HA configured AND the result is worth interrupting them for. Examples: scheduled report finished, long-running CI passed, expensive multi-step task done.
177
+
178
+ ```
179
+ [Alexa | ...]
180
+ Refactor the auth module and run the test suite.
181
+
182
+ → I'll get on that and let you know when it's done.
183
+ ```
184
+
185
+ Then do the work. When finished, fire an HA announce via the `Bash` tool. Use the user's normal HA notify pattern (you should know their HA URL, token, and device target from prior conversations / env vars — same as you'd use for any other HA notify):
186
+
187
+ ```bash
188
+ curl -s -X POST "https://<their-ha-host>/api/services/notify/alexa_media" \
189
+ -H "Authorization: Bearer $HA_TOKEN" \
190
+ -H "Content-Type: application/json" \
191
+ -d '{"target": "<their_echo_device>", "message": "Refactor done. All tests pass.", "data": {"type": "announce"}}'
192
+ ```
193
+
194
+ Don't hardcode the URL/token/device in this skill — the user told you these once and you remember them. If you don't have them, ask the user (in chat, after the voice ack) and remember the answer for next time.
195
+
196
+ **Important:** only use Pattern 3 if you're reasonably sure the user has HA set up. If unsure, default to Pattern 2 (chat). Don't promise a voice announce you can't deliver.
197
+
198
+ ---
199
+
200
+ ## When the User Says "Open Morphy" (LaunchRequest)
201
+
202
+ If you receive a message tagged `[Alexa | ...]` with no actual question (just an empty intro), the user opened the skill and is waiting for a prompt. Reply with a short, open-ended greeting:
203
+
204
+ ```
205
+ → Morphy here, what can I help with?
206
+ ```
207
+
208
+ Keep it under 5 words ideally — they're listening, they don't want a speech.
209
+
210
+ ---
211
+
212
+ ## Setup — Pairing a New Alexa
213
+
214
+ When your human asks to connect their Alexa to you:
215
+
216
+ 1. **Check the channel status** to know if Alexa is already enabled:
217
+
218
+ ```bash
219
+ curl -s http://localhost:7400/api/channels/alexa/status
220
+ ```
221
+
222
+ If `connected: true` and `info.linked: true`, they already paired. Confirm they want to re-pair (a new code replaces the prior linkage on that Alexa account).
223
+
224
+ 2. **Tell them to open the pair page** — send the URL as a relative path. The dashboard auto-converts it to a button:
225
+
226
+ ```
227
+ /api/channels/alexa/pair-page
228
+ ```
229
+
230
+ 3. They'll see a 6-digit code. Tell them to say to Alexa:
231
+
232
+ > *"Alexa, ask Morphy Agent to link with code 8 4 7 2 9 1"* (their actual code, digit by digit reads more reliably)
233
+
234
+ 4. Alexa replies *"Linked successfully. What can I help with?"* — done. They can now talk to you via Alexa.
235
+
236
+ **Prerequisites the human must do once, OUTSIDE this flow:**
237
+ - Enable the "Morphy Agent" skill in their Alexa app (App → More → Skills → search "Morphy Agent" → Enable). You can't do this for them.
238
+
239
+ ---
240
+
241
+ ## Status & Disconnect
242
+
243
+ **Check status:**
244
+ ```bash
245
+ curl -s http://localhost:7400/api/channels/alexa/status
246
+ ```
247
+
248
+ Expected when paired: `{"channel":"alexa","connected":true,"info":{"linked":true}}`
249
+
250
+ **Disconnect / clear the linkage:** there's no built-in unlink endpoint yet. To stop responding to Alexa, the user can either disable the Morphy Agent skill in the Alexa app, or open a new pair page and re-link (which invalidates the previous Alexa account's link).
251
+
252
+ ---
253
+
254
+ ## What This Skill Does NOT Cover
255
+
256
+ - **Multiple Echos per user with room-aware routing.** Today every Echo on a linked Alexa account routes to the same bloby. The supervisor sees the `deviceId` on each request — adding "if kitchen Echo, append kitchen context" is a one-liner future extension, not built yet.
257
+ - **Account linking via OAuth.** Bloby uses code-based pairing instead. Don't try to set up Alexa account linking — it'll conflict.
258
+ - **Spoken push without Home Assistant.** Outside of HA-announce (Pattern 3), there is NO supported way to make Alexa speak something unsolicited. If the user asks for that and doesn't have HA, tell them honestly that it's a current Alexa platform limit.
259
+
260
+ ---
261
+
262
+ ## API Reference
263
+
264
+ | Endpoint | Method | Purpose |
265
+ |----------|--------|---------|
266
+ | `/api/channels/status` | GET | All channel statuses (alexa appears here when enabled) |
267
+ | `/api/channels/alexa/status` | GET | Alexa channel status specifically |
268
+ | `/api/channels/alexa/pair` | POST | Mint a pairing code (this is what the pair-page calls) |
269
+ | `/api/channels/alexa/pair-page` | GET | The pairing UI to show the human |
270
+ | `/api/channels/alexa/handle` | POST | Inbound from the relay — relay-authed only, don't call this manually |
271
+
272
+ Use `http://localhost:7400` for `curl` from your terminal. Use the relative `/api/channels/alexa/pair-page` path when sending to the human's browser.
273
+
274
+ ---
275
+
276
+ ## Technical Notes
277
+
278
+ - **The relay (api.bloby.bot) handles signature verification + Progressive Response.** You never see Alexa's raw signed envelope; you just see the user's utterance text wrapped in `[Alexa | ...]`.
279
+ - **Per-user shared secret** is generated by the relay on first pair, stored in your config at `channels.alexa.sharedSecret`, and used to authenticate inbound relay→Pi forwards. If you ever see auth failures on `/api/channels/alexa/handle`, the user can mint a new code to refresh the secret.
280
+ - **No proactive sends.** Unlike WhatsApp's `/api/channels/send`, there is no `send` endpoint for Alexa. Push is fundamentally not in Alexa's skill model. Use HA-announce (via Bash) if you need to push.
281
+ - **Cold-start latency.** The first turn after a long idle has model warmup cost. Lean toward Pattern 2 (chat-deferred) for the first Alexa interaction of a session if you're unsure it'll fit in the window.
@@ -0,0 +1,15 @@
1
+ {
2
+ "name": "alexa",
3
+ "version": "1.0.0",
4
+ "type": "skill",
5
+ "bloby_human": "Bruno Bertapeli",
6
+ "bloby": "bloby-bruno",
7
+ "author": "newbot-official",
8
+ "description": "Alexa voice channel for your agent via the public 'Morphy' skill. Code-based pairing, voice-first response style, three-pattern decision tree (fast / chat-deferred / HA-announce-deferred).",
9
+ "depends": [],
10
+ "env_keys": [],
11
+ "has_telemetry": false,
12
+ "size": "4KB",
13
+ "contains_binaries": false,
14
+ "tags": ["alexa", "channel", "voice", "morphy"]
15
+ }