open-agents-ai 0.187.580 → 0.187.582

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -131,6 +131,7 @@ An autonomous multi-turn tool-calling agent that reads your code, makes changes,
131
131
  - [Research Foundations](#research-foundations)
132
132
  - [Voice Feedback (TTS)](#voice-feedback-tts)
133
133
  - [LuxTTS Voice Cloning](#luxtts-voice-cloning)
134
+ - [Supertonic Expressive Tags](#supertonic-expressive-tags)
134
135
  - [Narration Engine Architecture](#narration-engine-architecture)
135
136
  - [Emotion-Driven Prosody (SEST)](#emotion-driven-prosody-sest)
136
137
  - [Personality-Aware Voice](#personality-aware-voice)
@@ -148,6 +149,8 @@ An autonomous multi-turn tool-calling agent that reads your code, makes changes,
148
149
  - [Browser Automation](#browser-automation)
149
150
  - [Interactive TUI](#interactive-tui)
150
151
  - [Slash Commands](#slash-commands)
152
+ - [Platform Connectors](#platform-connectors)
153
+ - [Workspace Explorer](#workspace-explorer)
151
154
  - [Mid-Task Steering (Sub-Agent Architecture)](#mid-task-steering-sub-agent-architecture)
152
155
  - [Telegram Bridge — Sub-Agent Per Chat](#telegram-bridge--sub-agent-per-chat)
153
156
  - [Admin Slash Command Passthrough](#admin-slash-command-passthrough)
@@ -281,6 +284,8 @@ The agent uses tools autonomously in a loop — reading errors, fixing code, and
281
284
  - **Littleman Observer** — parallel meta-analysis system that watches the agent loop in real-time. Detects false failure claims after successful tools, blocks redundant re-execution, catches runaway one-sided output in conversations, and dynamically extends turn limits when active work is detected. Emits `debug_context` and `debug_littleman` events for live observability
282
285
  - **Interactive Session Lock** — generic `SESSION_ACTIVE` protocol prevents premature task completion during long-running sessions (phone calls, live chat, monitoring). Any MCP contract can adopt the protocol. Paired with context-engineered system prompts that teach small models to maintain conversation loops
283
286
  - **Voice Chat** — `/voicechat` starts an async voice conversation that runs parallel to the main agent loop. Mic audio is transcribed via Whisper and injected as user messages; agent responses are synthesized to speech via TTS. Neither blocks the other — talk to the agent while it works
287
+ - **Platform connector menus** — `/platforms menu` opens a TUI onboarding and status surface for Telegram plus Discord, Slack, Matrix, and webhook adapter configuration. `/gateway` is the alias for the same connector surface
288
+ - **Workspace explorer** — `/files`, `/files <query>`, and `/files menu` provide a root-bounded working-directory browser with searchable file selection, noisy-directory filtering, file classification, and inline previews in the TUI scrollback
284
289
 
285
290
  ### Cross-Modal Workers
286
291
 
@@ -312,6 +317,7 @@ The daemon auto-installs Python dependencies (OpenCLIP, torchaudio + soundfile,
312
317
  - **Call Sub-Agent** — each WebSocket caller gets a dedicated AgenticRunner for low-latency voice-to-voice loops, with admin/public access tiers and bidirectional activity sharing with the main agent
313
318
  - **Telegram Voice** — `/voice` enabled via Telegram forwards TTS audio as voice messages alongside text responses. Incoming voice messages are auto-transcribed and handled as text
314
319
  - **Neural TTS** — hear what the agent is doing via GLaDOS, Overwatch, Kokoro, or LuxTTS voice clone, with literature-grounded narration engine (sNeuron-TST structure rotation, Moshi ring buffer dedup, UDDETTS emotion-driven prosody, SEST metadata, LuxTTS flow-matching voice cloning)
320
+ - **Supertonic expressive tags** — when `/voice supertonic` is active, OA inserts supported expression tags such as `<sigh>`, `<breath>`, and `<laugh>` into spoken status updates based on failure, recovery, sentence boundaries, success, and playful tone. Other voice backends receive sanitized plain text
315
321
  - **Personality Core** — SAC framework-based style control (concise/balanced/verbose/pedagogical) that shapes agent response depth, voice expressiveness, and system prompt behavior
316
322
  - **Human expert speed ratio** — real-time `Exp: Nx` gauge comparing agent speed to a leading human expert, calibrated across 47 tool baselines
317
323
  - **Cost tracking** — real-time token cost estimation for 15+ cloud providers
@@ -2814,6 +2820,7 @@ The emotion system is informed by peer-reviewed and preprint research:
2814
2820
  /voice overwatch # Overwatch voice (ONNX, ~50MB)
2815
2821
  /voice kokoro # Kokoro voice (MLX, macOS Apple Silicon)
2816
2822
  /voice luxtts # LuxTTS voice clone (flow-matching, any platform)
2823
+ /voice supertonic # Supertonic3 voice with expressive tags
2817
2824
  /voice clone <file> # Set clone reference audio for LuxTTS (wav/mp3/ogg/flac)
2818
2825
  /voice clone glados # Generate clone ref from GLaDOS → LuxTTS
2819
2826
  /voice clone overwatch # Generate clone ref from Overwatch → LuxTTS
@@ -2845,6 +2852,20 @@ Auto-downloads the ONNX voice model (~50MB) on first use. LuxTTS is the primary
2845
2852
 
2846
2853
  Output: 48kHz WAV, compatible with Telegram voice messages and WebSocket streaming.
2847
2854
 
2855
+ ### Supertonic Expressive Tags
2856
+
2857
+ When Supertonic is the active voice backend, OA decorates spoken status updates with the expression tags Supertonic supports. The tag pass runs after markdown/ANSI cleanup and only for Supertonic, so GLaDOS, Overwatch, Kokoro, and LuxTTS continue receiving plain sanitized text.
2858
+
2859
+ Tag placement is context-aware:
2860
+
2861
+ | Tag | Used for |
2862
+ |-----|----------|
2863
+ | `<sigh>` | Failures, retry loops, blocked work, and error recovery |
2864
+ | `<breath>` | Natural pauses between sentences or before follow-up checks |
2865
+ | `<laugh>` | Playful or successful moments where a lighter delivery fits |
2866
+
2867
+ Existing tags are preserved and duplicate tags are suppressed. The result is still concise TUI narration, but failure modes, recovery, success, and playful updates sound less flat when `/voice supertonic` is selected.
2868
+
2848
2869
  ### Narration Engine Architecture
2849
2870
 
2850
2871
  The voice narration system produces **zero static phrase pools** — every spoken sentence is dynamically composed from live tool state, session metrics, and emotion coordinates. The architecture is grounded in 2024-2026 TTS and emotion research:
@@ -3271,7 +3292,7 @@ The TUI features an animated multilingual phrase carousel, live metrics bar with
3271
3292
  | `/compact` | Force context compaction now (default strategy) |
3272
3293
  | `/compact <strategy>` | Compact with strategy: `aggressive`, `decisions`, `errors`, `summary`, `structured` |
3273
3294
  | **Audio & Vision** | |
3274
- | `/voice [model]` | Toggle TTS voice (GLaDOS, Overwatch, Kokoro, LuxTTS) |
3295
+ | `/voice [model]` | Toggle TTS voice (GLaDOS, Overwatch, Kokoro, LuxTTS, Supertonic) |
3275
3296
  | `/listen [mode]` | Toggle live microphone transcription |
3276
3297
  | `/dream [mode]` | Start dream mode (default, deep, lucid) |
3277
3298
  | **Display & Behavior** | |
@@ -3284,6 +3305,29 @@ The TUI features an animated multilingual phrase carousel, live metrics bar with
3284
3305
  | `/tools` | List agent-created custom tools |
3285
3306
  | `/skills [keyword]` | List/search available AIWG skills |
3286
3307
  | `/<skill-name> [args]` | Invoke an AIWG skill directly |
3308
+ | **Workspace Explorer** | |
3309
+ | `/files` | Show searchable working-directory file overview |
3310
+ | `/files <query>` | Filter the working-directory file overview |
3311
+ | `/files menu` | Open searchable workspace file selector and inline preview |
3312
+ | `/workspace` | Alias for `/files` |
3313
+ | `/workspace <query>` | Alias for `/files <query>` |
3314
+ | `/workspace menu` | Alias for `/files menu` |
3315
+ | **Platform Connectors** | |
3316
+ | `/platforms` | Show platform adapter state |
3317
+ | `/platforms status` | Show platform adapter state |
3318
+ | `/platforms menu` | Open interactive platform connectivity menu |
3319
+ | `/platforms setup` | Open platform onboarding menu |
3320
+ | `/platforms onboard` | Open platform onboarding menu |
3321
+ | `/platforms config` | Show redacted platform adapter configuration |
3322
+ | `/platforms enable <id>` | Enable a platform adapter |
3323
+ | `/platforms disable <id>` | Disable a platform adapter |
3324
+ | `/platforms set <id> token-env <ENV>` | Set adapter token environment variable |
3325
+ | `/platforms set <id> base-url <url>` | Set adapter base URL or webhook URL |
3326
+ | `/platforms set <id> target <target>` | Set default channel, room, or target |
3327
+ | `/platforms set <id> mode <polling\|webhook>` | Set adapter polling or webhook mode |
3328
+ | `/platforms clear <id> <key>` | Clear saved adapter token, token-env, base-url, or target |
3329
+ | `/gateway` | Alias for `/platforms` |
3330
+ | `/gateway menu` | Alias for `/platforms menu` |
3287
3331
  | **P2P & Secrets** | |
3288
3332
  | `/p2p start` | Start the P2P inference mesh node |
3289
3333
  | `/p2p connect <url>` | Connect to a remote peer |
@@ -3317,6 +3361,29 @@ The TUI features an animated multilingual phrase carousel, live metrics bar with
3317
3361
 
3318
3362
  All settings commands accept `--local` to save to project `.oa/settings.json` instead of global config.
3319
3363
 
3364
+ ### Platform Connectors
3365
+
3366
+ `/platforms menu` opens a keyboard-driven connectivity surface for platform status and onboarding. The same surface is available through `/gateway menu`.
3367
+
3368
+ Supported adapter IDs are `telegram`, `discord`, `slack`, `matrix`, and `webhook`. Telegram is wired into the existing bridge controls, so the menu can save the bot token, save the admin user ID, start the bridge, stop it, and show status. The other platform adapters expose the shared connector configuration surface: enable/disable, token environment variable, base URL or webhook URL, default target/channel/room, polling vs webhook mode, redacted config display, and status/health reporting.
3369
+
3370
+ Use environment variables for credentials:
3371
+
3372
+ ```bash
3373
+ /platforms set slack token-env SLACK_BOT_TOKEN
3374
+ /platforms set slack base-url https://slack.com/api
3375
+ /platforms set slack target C0123456789
3376
+ /platforms set slack mode webhook
3377
+ /platforms enable slack
3378
+ /platforms config
3379
+ ```
3380
+
3381
+ ### Workspace Explorer
3382
+
3383
+ `/files` renders a compact working-directory overview directly in the TUI scrollback. It classifies files as source, test, docs, config, assets, or other, skips noisy directories like `.git`, `node_modules`, build outputs, caches, and publish artifacts, and keeps all preview reads bounded to the current workspace root.
3384
+
3385
+ Use `/files <query>` for fast path filtering, or `/files menu` for the searchable selector. Selecting a file prints an inline, line-numbered preview without changing editor focus. `/workspace`, `/workspace <query>`, and `/workspace menu` are aliases for the same explorer.
3386
+
3320
3387
  ### Mid-Task Steering (Sub-Agent Architecture)
3321
3388
 
3322
3389
  While the agent is working (shown by the `+` prompt), type to add context. A **dedicated steering sub-agent** spins up in the background to process your input: