nodmix 2026.5.25
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CHANGELOG.md +11573 -0
- package/LICENSE +21 -0
- package/README.md +486 -0
- package/docs/.i18n/README.md +81 -0
- package/docs/.i18n/ar-navigation.json +18 -0
- package/docs/.i18n/de-navigation.json +18 -0
- package/docs/.i18n/es-navigation.json +18 -0
- package/docs/.i18n/fr-navigation.json +18 -0
- package/docs/.i18n/glossary.ar.json +78 -0
- package/docs/.i18n/glossary.de.json +78 -0
- package/docs/.i18n/glossary.es.json +78 -0
- package/docs/.i18n/glossary.fa.json +78 -0
- package/docs/.i18n/glossary.fr.json +78 -0
- package/docs/.i18n/glossary.id.json +78 -0
- package/docs/.i18n/glossary.it.json +78 -0
- package/docs/.i18n/glossary.ja-JP.json +98 -0
- package/docs/.i18n/glossary.ko.json +78 -0
- package/docs/.i18n/glossary.nl.json +78 -0
- package/docs/.i18n/glossary.pl.json +78 -0
- package/docs/.i18n/glossary.pt-BR.json +78 -0
- package/docs/.i18n/glossary.th.json +78 -0
- package/docs/.i18n/glossary.tr.json +78 -0
- package/docs/.i18n/glossary.uk.json +78 -0
- package/docs/.i18n/glossary.vi.json +78 -0
- package/docs/.i18n/glossary.zh-CN.json +1002 -0
- package/docs/.i18n/glossary.zh-TW.json +78 -0
- package/docs/.i18n/id-navigation.json +18 -0
- package/docs/.i18n/it-navigation.json +18 -0
- package/docs/.i18n/ja-navigation.json +18 -0
- package/docs/.i18n/ko-navigation.json +18 -0
- package/docs/.i18n/pl-navigation.json +18 -0
- package/docs/.i18n/pt-BR-navigation.json +18 -0
- package/docs/.i18n/tr-navigation.json +18 -0
- package/docs/.i18n/translation-workflow.md +111 -0
- package/docs/.i18n/zh-Hans-navigation.json +542 -0
- package/docs/AGENTS.md +36 -0
- package/docs/announcements/bluebubbles-imessage.md +79 -0
- package/docs/assets/install-script.svg +1 -0
- package/docs/assets/macos-onboarding/01-macos-warning.jpeg +0 -0
- package/docs/assets/macos-onboarding/02-local-networks.jpeg +0 -0
- package/docs/assets/macos-onboarding/03-security-notice.png +0 -0
- package/docs/assets/macos-onboarding/04-choose-gateway.png +0 -0
- package/docs/assets/macos-onboarding/05-permissions.png +0 -0
- package/docs/assets/openclaw-logo-text-dark.png +0 -0
- package/docs/assets/openclaw-logo-text-dark.svg +418 -0
- package/docs/assets/openclaw-logo-text.png +0 -0
- package/docs/assets/openclaw-logo-text.svg +418 -0
- package/docs/assets/pixel-lobster.svg +60 -0
- package/docs/assets/pr/quick-settings-browser-tools.png +0 -0
- package/docs/assets/showcase/agents-ui.jpg +0 -0
- package/docs/assets/showcase/bambu-cli.png +0 -0
- package/docs/assets/showcase/codexmonitor.png +0 -0
- package/docs/assets/showcase/gohome-grafana.png +0 -0
- package/docs/assets/showcase/ios-testflight.jpg +0 -0
- package/docs/assets/showcase/oura-health.png +0 -0
- package/docs/assets/showcase/padel-cli.svg +11 -0
- package/docs/assets/showcase/padel-screenshot.jpg +0 -0
- package/docs/assets/showcase/papla-tts.jpg +0 -0
- package/docs/assets/showcase/pr-review-telegram.jpg +0 -0
- package/docs/assets/showcase/roborock-screenshot.jpg +0 -0
- package/docs/assets/showcase/roborock-status.svg +13 -0
- package/docs/assets/showcase/roof-camera-sky.jpg +0 -0
- package/docs/assets/showcase/snag.png +0 -0
- package/docs/assets/showcase/tesco-shop.jpg +0 -0
- package/docs/assets/showcase/wienerlinien.png +0 -0
- package/docs/assets/showcase/wine-cellar-skill.jpg +0 -0
- package/docs/assets/showcase/winix-air-purifier.jpg +0 -0
- package/docs/assets/showcase/xuezh-pronunciation.jpeg +0 -0
- package/docs/assets/sponsors/blacksmith-light.svg +14 -0
- package/docs/assets/sponsors/blacksmith.svg +14 -0
- package/docs/assets/sponsors/convex-light.svg +16 -0
- package/docs/assets/sponsors/convex.svg +16 -0
- package/docs/assets/sponsors/github-light.svg +3 -0
- package/docs/assets/sponsors/github.svg +3 -0
- package/docs/assets/sponsors/nvidia-dark.svg +9 -0
- package/docs/assets/sponsors/nvidia.svg +9 -0
- package/docs/assets/sponsors/openai-light.svg +3 -0
- package/docs/assets/sponsors/openai.svg +3 -0
- package/docs/assets/sponsors/vercel-light.svg +5 -0
- package/docs/assets/sponsors/vercel.svg +5 -0
- package/docs/auth-credential-semantics.md +124 -0
- package/docs/automation/auth-monitoring.md +11 -0
- package/docs/automation/clawflow.md +12 -0
- package/docs/automation/cron-jobs.md +500 -0
- package/docs/automation/cron-vs-heartbeat.md +11 -0
- package/docs/automation/gmail-pubsub.md +11 -0
- package/docs/automation/hooks.md +365 -0
- package/docs/automation/index.md +135 -0
- package/docs/automation/poll.md +12 -0
- package/docs/automation/standing-orders.md +250 -0
- package/docs/automation/taskflow.md +155 -0
- package/docs/automation/tasks.md +374 -0
- package/docs/automation/troubleshooting.md +12 -0
- package/docs/automation/webhook.md +12 -0
- package/docs/brave-search.md +11 -0
- package/docs/channels/access-groups.md +201 -0
- package/docs/channels/ambient-room-events.md +214 -0
- package/docs/channels/bot-loop-protection.md +131 -0
- package/docs/channels/broadcast-groups.md +472 -0
- package/docs/channels/channel-routing.md +162 -0
- package/docs/channels/clickclack.md +138 -0
- package/docs/channels/discord.md +1762 -0
- package/docs/channels/feishu.md +502 -0
- package/docs/channels/googlechat.md +284 -0
- package/docs/channels/group-messages.md +95 -0
- package/docs/channels/groups.md +519 -0
- package/docs/channels/imessage-from-bluebubbles.md +259 -0
- package/docs/channels/imessage.md +813 -0
- package/docs/channels/index.md +64 -0
- package/docs/channels/irc.md +253 -0
- package/docs/channels/line.md +243 -0
- package/docs/channels/location.md +71 -0
- package/docs/channels/matrix-migration.md +370 -0
- package/docs/channels/matrix-presentation.md +77 -0
- package/docs/channels/matrix-push-rules.md +150 -0
- package/docs/channels/matrix.md +921 -0
- package/docs/channels/mattermost.md +542 -0
- package/docs/channels/msteams.md +1042 -0
- package/docs/channels/nextcloud-talk.md +176 -0
- package/docs/channels/nostr.md +253 -0
- package/docs/channels/pairing.md +214 -0
- package/docs/channels/qqbot.md +309 -0
- package/docs/channels/signal.md +400 -0
- package/docs/channels/slack.md +1564 -0
- package/docs/channels/synology-chat.md +187 -0
- package/docs/channels/telegram.md +1107 -0
- package/docs/channels/tlon.md +296 -0
- package/docs/channels/troubleshooting.md +161 -0
- package/docs/channels/twitch.md +431 -0
- package/docs/channels/wechat.md +171 -0
- package/docs/channels/whatsapp.md +739 -0
- package/docs/channels/yuanbao.md +416 -0
- package/docs/channels/zalo.md +253 -0
- package/docs/channels/zalouser.md +199 -0
- package/docs/ci.md +612 -0
- package/docs/clawhub/publishing.md +96 -0
- package/docs/cli/acp.md +370 -0
- package/docs/cli/agent.md +103 -0
- package/docs/cli/agents.md +232 -0
- package/docs/cli/approvals.md +190 -0
- package/docs/cli/backup.md +97 -0
- package/docs/cli/browser.md +307 -0
- package/docs/cli/channels.md +154 -0
- package/docs/cli/clawbot.md +25 -0
- package/docs/cli/commitments.md +90 -0
- package/docs/cli/completion.md +39 -0
- package/docs/cli/config.md +504 -0
- package/docs/cli/configure.md +77 -0
- package/docs/cli/crestodian.md +332 -0
- package/docs/cli/cron.md +281 -0
- package/docs/cli/daemon.md +67 -0
- package/docs/cli/dashboard.md +33 -0
- package/docs/cli/devices.md +204 -0
- package/docs/cli/directory.md +68 -0
- package/docs/cli/dns.md +53 -0
- package/docs/cli/docs.md +73 -0
- package/docs/cli/doctor.md +237 -0
- package/docs/cli/flows.md +52 -0
- package/docs/cli/gateway.md +567 -0
- package/docs/cli/health.md +43 -0
- package/docs/cli/hooks.md +345 -0
- package/docs/cli/index.md +396 -0
- package/docs/cli/infer.md +364 -0
- package/docs/cli/logs.md +65 -0
- package/docs/cli/mcp.md +529 -0
- package/docs/cli/memory.md +183 -0
- package/docs/cli/message.md +317 -0
- package/docs/cli/migrate.md +290 -0
- package/docs/cli/models.md +224 -0
- package/docs/cli/node.md +177 -0
- package/docs/cli/nodes.md +76 -0
- package/docs/cli/onboard.md +245 -0
- package/docs/cli/pairing.md +77 -0
- package/docs/cli/path.md +502 -0
- package/docs/cli/plugins.md +454 -0
- package/docs/cli/policy.md +418 -0
- package/docs/cli/proxy.md +89 -0
- package/docs/cli/qr.md +56 -0
- package/docs/cli/reset.md +39 -0
- package/docs/cli/sandbox.md +208 -0
- package/docs/cli/secrets.md +202 -0
- package/docs/cli/security.md +124 -0
- package/docs/cli/sessions.md +164 -0
- package/docs/cli/setup.md +59 -0
- package/docs/cli/skills.md +102 -0
- package/docs/cli/status.md +45 -0
- package/docs/cli/system.md +89 -0
- package/docs/cli/tasks.md +111 -0
- package/docs/cli/tui.md +89 -0
- package/docs/cli/uninstall.md +44 -0
- package/docs/cli/update.md +242 -0
- package/docs/cli/voicecall.md +204 -0
- package/docs/cli/webhooks.md +117 -0
- package/docs/cli/wiki.md +256 -0
- package/docs/concepts/active-memory.md +856 -0
- package/docs/concepts/agent-loop.md +185 -0
- package/docs/concepts/agent-runtimes.md +243 -0
- package/docs/concepts/agent-workspace.md +230 -0
- package/docs/concepts/agent.md +136 -0
- package/docs/concepts/architecture.md +154 -0
- package/docs/concepts/channel-docking.md +145 -0
- package/docs/concepts/commitments.md +150 -0
- package/docs/concepts/compaction.md +203 -0
- package/docs/concepts/context-engine.md +306 -0
- package/docs/concepts/context.md +199 -0
- package/docs/concepts/delegate-architecture.md +319 -0
- package/docs/concepts/dreaming.md +261 -0
- package/docs/concepts/experimental-features.md +108 -0
- package/docs/concepts/features.md +91 -0
- package/docs/concepts/mantis-slack-desktop-runbook.md +202 -0
- package/docs/concepts/mantis.md +740 -0
- package/docs/concepts/markdown-formatting.md +139 -0
- package/docs/concepts/memory-builtin.md +146 -0
- package/docs/concepts/memory-honcho.md +144 -0
- package/docs/concepts/memory-qmd.md +271 -0
- package/docs/concepts/memory-search.md +166 -0
- package/docs/concepts/memory.md +258 -0
- package/docs/concepts/message-lifecycle-refactor.md +1128 -0
- package/docs/concepts/messages.md +214 -0
- package/docs/concepts/model-failover.md +385 -0
- package/docs/concepts/model-providers.md +715 -0
- package/docs/concepts/models.md +370 -0
- package/docs/concepts/multi-agent.md +619 -0
- package/docs/concepts/oauth.md +198 -0
- package/docs/concepts/openclaw-sdk.md +323 -0
- package/docs/concepts/parallel-specialist-lanes.md +127 -0
- package/docs/concepts/personal-agent-benchmark-pack.md +74 -0
- package/docs/concepts/presence.md +117 -0
- package/docs/concepts/progress-drafts.md +362 -0
- package/docs/concepts/qa-e2e-automation.md +820 -0
- package/docs/concepts/qa-matrix.md +139 -0
- package/docs/concepts/queue-steering.md +90 -0
- package/docs/concepts/queue.md +122 -0
- package/docs/concepts/retry.md +86 -0
- package/docs/concepts/session-pruning.md +104 -0
- package/docs/concepts/session-tool.md +190 -0
- package/docs/concepts/session.md +164 -0
- package/docs/concepts/soul.md +116 -0
- package/docs/concepts/streaming.md +251 -0
- package/docs/concepts/system-prompt.md +310 -0
- package/docs/concepts/timezone.md +47 -0
- package/docs/concepts/typebox.md +309 -0
- package/docs/concepts/typing-indicators.md +88 -0
- package/docs/concepts/usage-tracking.md +66 -0
- package/docs/date-time.md +126 -0
- package/docs/debug/node-issue.md +90 -0
- package/docs/diagnostics/flags.md +138 -0
- package/docs/docs.json +1832 -0
- package/docs/gateway/authentication.md +239 -0
- package/docs/gateway/background-process.md +147 -0
- package/docs/gateway/bonjour.md +303 -0
- package/docs/gateway/bridge-protocol.md +94 -0
- package/docs/gateway/cli-backends.md +420 -0
- package/docs/gateway/config-agents.md +1514 -0
- package/docs/gateway/config-channels.md +945 -0
- package/docs/gateway/config-tools.md +769 -0
- package/docs/gateway/configuration-examples.md +705 -0
- package/docs/gateway/configuration-reference.md +1393 -0
- package/docs/gateway/configuration.md +737 -0
- package/docs/gateway/diagnostics.md +213 -0
- package/docs/gateway/discovery.md +154 -0
- package/docs/gateway/doctor.md +574 -0
- package/docs/gateway/gateway-lock.md +37 -0
- package/docs/gateway/health.md +73 -0
- package/docs/gateway/heartbeat.md +493 -0
- package/docs/gateway/index.md +383 -0
- package/docs/gateway/local-model-services.md +205 -0
- package/docs/gateway/local-models.md +355 -0
- package/docs/gateway/logging.md +149 -0
- package/docs/gateway/multiple-gateways.md +178 -0
- package/docs/gateway/network-model.md +15 -0
- package/docs/gateway/openai-http-api.md +350 -0
- package/docs/gateway/openresponses-http-api.md +347 -0
- package/docs/gateway/openshell.md +316 -0
- package/docs/gateway/opentelemetry.md +404 -0
- package/docs/gateway/operator-scopes.md +111 -0
- package/docs/gateway/pairing.md +207 -0
- package/docs/gateway/prometheus.md +230 -0
- package/docs/gateway/protocol.md +803 -0
- package/docs/gateway/remote-gateway-readme.md +169 -0
- package/docs/gateway/remote.md +280 -0
- package/docs/gateway/sandbox-vs-tool-policy-vs-elevated.md +146 -0
- package/docs/gateway/sandboxing.md +545 -0
- package/docs/gateway/secrets-plan-contract.md +114 -0
- package/docs/gateway/secrets.md +609 -0
- package/docs/gateway/security/audit-checks.md +127 -0
- package/docs/gateway/security/index.md +1326 -0
- package/docs/gateway/security/secure-file-operations.md +76 -0
- package/docs/gateway/tailscale.md +156 -0
- package/docs/gateway/tools-invoke-http-api.md +169 -0
- package/docs/gateway/troubleshooting.md +772 -0
- package/docs/gateway/trusted-proxy-auth.md +451 -0
- package/docs/help/debugging.md +344 -0
- package/docs/help/environment.md +214 -0
- package/docs/help/faq-first-run.md +867 -0
- package/docs/help/faq-models.md +553 -0
- package/docs/help/faq.md +1975 -0
- package/docs/help/gpt55-codex-agentic-parity-maintainers.md +196 -0
- package/docs/help/gpt55-codex-agentic-parity.md +230 -0
- package/docs/help/index.md +39 -0
- package/docs/help/scripts.md +56 -0
- package/docs/help/testing-live.md +580 -0
- package/docs/help/testing-updates-plugins.md +291 -0
- package/docs/help/testing.md +928 -0
- package/docs/help/troubleshooting.md +424 -0
- package/docs/images/configure-model-picker-unsearchable.png +0 -0
- package/docs/images/feishu-get-group-id.png +0 -0
- package/docs/images/groups-flow.svg +52 -0
- package/docs/images/mobile-ui-screenshot.png +0 -0
- package/docs/index.md +196 -0
- package/docs/install/ansible.md +233 -0
- package/docs/install/azure.md +315 -0
- package/docs/install/bun.md +59 -0
- package/docs/install/clawdock.md +112 -0
- package/docs/install/development-channels.md +135 -0
- package/docs/install/digitalocean.md +174 -0
- package/docs/install/docker-vm-runtime.md +154 -0
- package/docs/install/docker.md +562 -0
- package/docs/install/exe-dev.md +201 -0
- package/docs/install/fly.md +524 -0
- package/docs/install/gcp.md +418 -0
- package/docs/install/hetzner.md +285 -0
- package/docs/install/hostinger.md +98 -0
- package/docs/install/index.md +221 -0
- package/docs/install/installer.md +455 -0
- package/docs/install/kubernetes.md +196 -0
- package/docs/install/macos-vm.md +281 -0
- package/docs/install/migrating-claude.md +165 -0
- package/docs/install/migrating-hermes.md +177 -0
- package/docs/install/migrating.md +137 -0
- package/docs/install/nix.md +112 -0
- package/docs/install/node.md +142 -0
- package/docs/install/northflank.mdx +44 -0
- package/docs/install/oracle.md +218 -0
- package/docs/install/podman.md +210 -0
- package/docs/install/railway.mdx +92 -0
- package/docs/install/raspberry-pi.md +234 -0
- package/docs/install/render.mdx +167 -0
- package/docs/install/uninstall.md +131 -0
- package/docs/install/updating.md +280 -0
- package/docs/logging.md +318 -0
- package/docs/nav-tabs-underline.js +100 -0
- package/docs/network.md +72 -0
- package/docs/nodes/audio.md +215 -0
- package/docs/nodes/camera.md +166 -0
- package/docs/nodes/images.md +77 -0
- package/docs/nodes/index.md +439 -0
- package/docs/nodes/location-command.md +102 -0
- package/docs/nodes/media-understanding.md +469 -0
- package/docs/nodes/talk.md +154 -0
- package/docs/nodes/troubleshooting.md +123 -0
- package/docs/nodes/voicewake.md +93 -0
- package/docs/perplexity.md +11 -0
- package/docs/pi-dev.md +82 -0
- package/docs/pi.md +573 -0
- package/docs/plan/codex-context-engine-harness.md +624 -0
- package/docs/plan/ui-channels.md +284 -0
- package/docs/platforms/android.md +285 -0
- package/docs/platforms/digitalocean.md +12 -0
- package/docs/platforms/index.md +60 -0
- package/docs/platforms/ios.md +283 -0
- package/docs/platforms/linux.md +141 -0
- package/docs/platforms/mac/bundled-gateway.md +79 -0
- package/docs/platforms/mac/canvas.md +128 -0
- package/docs/platforms/mac/child-process.md +72 -0
- package/docs/platforms/mac/dev-setup.md +112 -0
- package/docs/platforms/mac/health.md +39 -0
- package/docs/platforms/mac/icon.md +36 -0
- package/docs/platforms/mac/logging.md +62 -0
- package/docs/platforms/mac/menu-bar.md +93 -0
- package/docs/platforms/mac/peekaboo.md +92 -0
- package/docs/platforms/mac/permissions.md +53 -0
- package/docs/platforms/mac/remote.md +123 -0
- package/docs/platforms/mac/signing.md +52 -0
- package/docs/platforms/mac/skills.md +43 -0
- package/docs/platforms/mac/voice-overlay.md +66 -0
- package/docs/platforms/mac/voicewake.md +73 -0
- package/docs/platforms/mac/webchat.md +54 -0
- package/docs/platforms/mac/xpc.md +66 -0
- package/docs/platforms/macos.md +226 -0
- package/docs/platforms/oracle.md +12 -0
- package/docs/platforms/raspberry-pi.md +13 -0
- package/docs/platforms/windows.md +286 -0
- package/docs/plugins/adding-capabilities.md +133 -0
- package/docs/plugins/admin-http-rpc.md +216 -0
- package/docs/plugins/agent-tools.md +13 -0
- package/docs/plugins/architecture-internals.md +1195 -0
- package/docs/plugins/architecture.md +481 -0
- package/docs/plugins/building-extensions.md +13 -0
- package/docs/plugins/building-plugins.md +330 -0
- package/docs/plugins/bundles.md +310 -0
- package/docs/plugins/cli-backend-plugins.md +310 -0
- package/docs/plugins/codex-computer-use.md +293 -0
- package/docs/plugins/codex-harness-reference.md +409 -0
- package/docs/plugins/codex-harness-runtime.md +247 -0
- package/docs/plugins/codex-harness.md +746 -0
- package/docs/plugins/codex-native-plugins.md +276 -0
- package/docs/plugins/community.md +77 -0
- package/docs/plugins/compatibility.md +164 -0
- package/docs/plugins/dependency-resolution.md +143 -0
- package/docs/plugins/google-meet.md +1737 -0
- package/docs/plugins/hooks.md +459 -0
- package/docs/plugins/install-overrides.md +80 -0
- package/docs/plugins/manage-plugins.md +210 -0
- package/docs/plugins/manifest.md +1359 -0
- package/docs/plugins/memory-lancedb.md +385 -0
- package/docs/plugins/memory-wiki.md +529 -0
- package/docs/plugins/message-presentation.md +473 -0
- package/docs/plugins/oc-path.md +166 -0
- package/docs/plugins/plugin-inventory.md +182 -0
- package/docs/plugins/reference/acpx.md +23 -0
- package/docs/plugins/reference/admin-http-rpc.md +23 -0
- package/docs/plugins/reference/alibaba.md +23 -0
- package/docs/plugins/reference/amazon-bedrock-mantle.md +23 -0
- package/docs/plugins/reference/amazon-bedrock.md +23 -0
- package/docs/plugins/reference/anthropic-vertex.md +19 -0
- package/docs/plugins/reference/anthropic.md +23 -0
- package/docs/plugins/reference/arcee.md +23 -0
- package/docs/plugins/reference/azure-speech.md +23 -0
- package/docs/plugins/reference/bonjour.md +19 -0
- package/docs/plugins/reference/brave.md +23 -0
- package/docs/plugins/reference/browser.md +23 -0
- package/docs/plugins/reference/byteplus.md +19 -0
- package/docs/plugins/reference/canvas.md +19 -0
- package/docs/plugins/reference/cerebras.md +23 -0
- package/docs/plugins/reference/chutes.md +23 -0
- package/docs/plugins/reference/clickclack.md +23 -0
- package/docs/plugins/reference/cloudflare-ai-gateway.md +23 -0
- package/docs/plugins/reference/codex.md +23 -0
- package/docs/plugins/reference/comfy.md +23 -0
- package/docs/plugins/reference/copilot-proxy.md +19 -0
- package/docs/plugins/reference/deepgram.md +23 -0
- package/docs/plugins/reference/deepinfra.md +23 -0
- package/docs/plugins/reference/deepseek.md +23 -0
- package/docs/plugins/reference/diagnostics-otel.md +19 -0
- package/docs/plugins/reference/diagnostics-prometheus.md +19 -0
- package/docs/plugins/reference/diffs.md +19 -0
- package/docs/plugins/reference/discord.md +23 -0
- package/docs/plugins/reference/document-extract.md +23 -0
- package/docs/plugins/reference/duckduckgo.md +23 -0
- package/docs/plugins/reference/elevenlabs.md +23 -0
- package/docs/plugins/reference/exa.md +23 -0
- package/docs/plugins/reference/fal.md +23 -0
- package/docs/plugins/reference/feishu.md +23 -0
- package/docs/plugins/reference/file-transfer.md +19 -0
- package/docs/plugins/reference/firecrawl.md +23 -0
- package/docs/plugins/reference/fireworks.md +23 -0
- package/docs/plugins/reference/github-copilot.md +23 -0
- package/docs/plugins/reference/google-meet.md +23 -0
- package/docs/plugins/reference/google.md +23 -0
- package/docs/plugins/reference/googlechat.md +23 -0
- package/docs/plugins/reference/gradium.md +23 -0
- package/docs/plugins/reference/groq.md +23 -0
- package/docs/plugins/reference/huggingface.md +23 -0
- package/docs/plugins/reference/imessage.md +23 -0
- package/docs/plugins/reference/inworld.md +23 -0
- package/docs/plugins/reference/irc.md +23 -0
- package/docs/plugins/reference/kilocode.md +23 -0
- package/docs/plugins/reference/kimi.md +23 -0
- package/docs/plugins/reference/line.md +23 -0
- package/docs/plugins/reference/litellm.md +23 -0
- package/docs/plugins/reference/llm-task.md +19 -0
- package/docs/plugins/reference/lmstudio.md +23 -0
- package/docs/plugins/reference/lobster.md +19 -0
- package/docs/plugins/reference/matrix.md +23 -0
- package/docs/plugins/reference/mattermost.md +23 -0
- package/docs/plugins/reference/memory-core.md +19 -0
- package/docs/plugins/reference/memory-lancedb.md +23 -0
- package/docs/plugins/reference/memory-wiki.md +23 -0
- package/docs/plugins/reference/microsoft-foundry.md +19 -0
- package/docs/plugins/reference/microsoft.md +19 -0
- package/docs/plugins/reference/migrate-claude.md +19 -0
- package/docs/plugins/reference/migrate-hermes.md +19 -0
- package/docs/plugins/reference/minimax.md +23 -0
- package/docs/plugins/reference/mistral.md +23 -0
- package/docs/plugins/reference/moonshot.md +23 -0
- package/docs/plugins/reference/msteams.md +23 -0
- package/docs/plugins/reference/nextcloud-talk.md +23 -0
- package/docs/plugins/reference/nostr.md +23 -0
- package/docs/plugins/reference/nvidia.md +23 -0
- package/docs/plugins/reference/oc-path.md +23 -0
- package/docs/plugins/reference/ollama.md +23 -0
- package/docs/plugins/reference/open-prose.md +19 -0
- package/docs/plugins/reference/openai.md +23 -0
- package/docs/plugins/reference/opencode-go.md +23 -0
- package/docs/plugins/reference/opencode.md +23 -0
- package/docs/plugins/reference/openrouter.md +23 -0
- package/docs/plugins/reference/openshell.md +19 -0
- package/docs/plugins/reference/perplexity.md +23 -0
- package/docs/plugins/reference/policy.md +23 -0
- package/docs/plugins/reference/qa-channel.md +23 -0
- package/docs/plugins/reference/qa-lab.md +19 -0
- package/docs/plugins/reference/qa-matrix.md +19 -0
- package/docs/plugins/reference/qianfan.md +23 -0
- package/docs/plugins/reference/qqbot.md +23 -0
- package/docs/plugins/reference/qwen.md +23 -0
- package/docs/plugins/reference/runway.md +23 -0
- package/docs/plugins/reference/searxng.md +19 -0
- package/docs/plugins/reference/senseaudio.md +23 -0
- package/docs/plugins/reference/sglang.md +23 -0
- package/docs/plugins/reference/signal.md +23 -0
- package/docs/plugins/reference/skill-workshop.md +23 -0
- package/docs/plugins/reference/slack.md +23 -0
- package/docs/plugins/reference/stepfun.md +23 -0
- package/docs/plugins/reference/synology-chat.md +23 -0
- package/docs/plugins/reference/synthetic.md +23 -0
- package/docs/plugins/reference/tavily.md +23 -0
- package/docs/plugins/reference/telegram.md +23 -0
- package/docs/plugins/reference/tencent.md +23 -0
- package/docs/plugins/reference/tlon.md +23 -0
- package/docs/plugins/reference/together.md +23 -0
- package/docs/plugins/reference/tokenjuice.md +23 -0
- package/docs/plugins/reference/tts-local-cli.md +19 -0
- package/docs/plugins/reference/twitch.md +23 -0
- package/docs/plugins/reference/venice.md +23 -0
- package/docs/plugins/reference/vercel-ai-gateway.md +23 -0
- package/docs/plugins/reference/vllm.md +23 -0
- package/docs/plugins/reference/voice-call.md +23 -0
- package/docs/plugins/reference/volcengine.md +23 -0
- package/docs/plugins/reference/voyage.md +19 -0
- package/docs/plugins/reference/vydra.md +23 -0
- package/docs/plugins/reference/web-readability.md +19 -0
- package/docs/plugins/reference/webhooks.md +23 -0
- package/docs/plugins/reference/whatsapp.md +23 -0
- package/docs/plugins/reference/xai.md +23 -0
- package/docs/plugins/reference/xiaomi.md +23 -0
- package/docs/plugins/reference/zai.md +23 -0
- package/docs/plugins/reference/zalo.md +23 -0
- package/docs/plugins/reference/zalouser.md +24 -0
- package/docs/plugins/reference.md +138 -0
- package/docs/plugins/sdk-agent-harness.md +339 -0
- package/docs/plugins/sdk-channel-ingress.md +137 -0
- package/docs/plugins/sdk-channel-message.md +458 -0
- package/docs/plugins/sdk-channel-plugins.md +762 -0
- package/docs/plugins/sdk-channel-turn.md +580 -0
- package/docs/plugins/sdk-entrypoints.md +333 -0
- package/docs/plugins/sdk-migration.md +949 -0
- package/docs/plugins/sdk-overview.md +501 -0
- package/docs/plugins/sdk-provider-plugins.md +807 -0
- package/docs/plugins/sdk-runtime.md +676 -0
- package/docs/plugins/sdk-setup.md +550 -0
- package/docs/plugins/sdk-subpaths.md +396 -0
- package/docs/plugins/sdk-testing.md +401 -0
- package/docs/plugins/skill-workshop.md +713 -0
- package/docs/plugins/tool-plugins.md +411 -0
- package/docs/plugins/voice-call.md +943 -0
- package/docs/plugins/webhooks.md +192 -0
- package/docs/plugins/zalouser.md +86 -0
- package/docs/prose.md +137 -0
- package/docs/providers/alibaba.md +158 -0
- package/docs/providers/anthropic.md +344 -0
- package/docs/providers/arcee.md +144 -0
- package/docs/providers/azure-speech.md +119 -0
- package/docs/providers/bedrock-mantle.md +211 -0
- package/docs/providers/bedrock.md +414 -0
- package/docs/providers/cerebras.md +130 -0
- package/docs/providers/chutes.md +153 -0
- package/docs/providers/claude-max-api-proxy.md +188 -0
- package/docs/providers/cloudflare-ai-gateway.md +119 -0
- package/docs/providers/comfy.md +362 -0
- package/docs/providers/deepgram.md +184 -0
- package/docs/providers/deepinfra.md +87 -0
- package/docs/providers/deepseek.md +146 -0
- package/docs/providers/ds4.md +309 -0
- package/docs/providers/elevenlabs.md +130 -0
- package/docs/providers/fal.md +204 -0
- package/docs/providers/fireworks.md +144 -0
- package/docs/providers/github-copilot.md +225 -0
- package/docs/providers/glm.md +137 -0
- package/docs/providers/google.md +472 -0
- package/docs/providers/gradium.md +123 -0
- package/docs/providers/groq.md +180 -0
- package/docs/providers/huggingface.md +235 -0
- package/docs/providers/index.md +102 -0
- package/docs/providers/inferrs.md +272 -0
- package/docs/providers/inworld.md +120 -0
- package/docs/providers/kilocode.md +135 -0
- package/docs/providers/litellm.md +234 -0
- package/docs/providers/lmstudio.md +224 -0
- package/docs/providers/minimax.md +505 -0
- package/docs/providers/mistral.md +235 -0
- package/docs/providers/models.md +65 -0
- package/docs/providers/moonshot.md +413 -0
- package/docs/providers/nvidia.md +140 -0
- package/docs/providers/ollama.md +1180 -0
- package/docs/providers/openai.md +1057 -0
- package/docs/providers/opencode-go.md +123 -0
- package/docs/providers/opencode.md +149 -0
- package/docs/providers/openrouter.md +349 -0
- package/docs/providers/perplexity-provider.md +123 -0
- package/docs/providers/qianfan.md +132 -0
- package/docs/providers/qwen.md +332 -0
- package/docs/providers/runway.md +103 -0
- package/docs/providers/senseaudio.md +68 -0
- package/docs/providers/sglang.md +161 -0
- package/docs/providers/stepfun.md +229 -0
- package/docs/providers/synthetic.md +154 -0
- package/docs/providers/tencent.md +130 -0
- package/docs/providers/together.md +141 -0
- package/docs/providers/venice.md +315 -0
- package/docs/providers/vercel-ai-gateway.md +128 -0
- package/docs/providers/vllm.md +383 -0
- package/docs/providers/volcengine.md +199 -0
- package/docs/providers/vydra.md +180 -0
- package/docs/providers/xai.md +560 -0
- package/docs/providers/xiaomi.md +188 -0
- package/docs/providers/zai.md +203 -0
- package/docs/refactor/access.md +9 -0
- package/docs/refactor/acp.md +298 -0
- package/docs/refactor/canvas.md +131 -0
- package/docs/refactor/ingress-core.md +341 -0
- package/docs/reference/AGENTS.default.md +129 -0
- package/docs/reference/RELEASING.md +767 -0
- package/docs/reference/api-usage-costs.md +202 -0
- package/docs/reference/application-modernization-plan.md +208 -0
- package/docs/reference/code-mode.md +757 -0
- package/docs/reference/credits.md +33 -0
- package/docs/reference/device-models.md +50 -0
- package/docs/reference/full-release-validation.md +202 -0
- package/docs/reference/memory-config.md +630 -0
- package/docs/reference/openclaw-sdk-api-design.md +390 -0
- package/docs/reference/prompt-caching.md +358 -0
- package/docs/reference/rich-output-protocol.md +79 -0
- package/docs/reference/rpc.md +43 -0
- package/docs/reference/secretref-credential-surface.md +159 -0
- package/docs/reference/secretref-user-supplied-credentials-matrix.json +663 -0
- package/docs/reference/session-management-compaction.md +461 -0
- package/docs/reference/templates/AGENTS.dev.md +89 -0
- package/docs/reference/templates/AGENTS.md +225 -0
- package/docs/reference/templates/BOOT.md +16 -0
- package/docs/reference/templates/BOOTSTRAP.md +66 -0
- package/docs/reference/templates/HEARTBEAT.md +16 -0
- package/docs/reference/templates/IDENTITY.dev.md +52 -0
- package/docs/reference/templates/IDENTITY.md +34 -0
- package/docs/reference/templates/SOUL.dev.md +82 -0
- package/docs/reference/templates/SOUL.md +49 -0
- package/docs/reference/templates/TOOLS.dev.md +29 -0
- package/docs/reference/templates/TOOLS.md +51 -0
- package/docs/reference/templates/USER.dev.md +23 -0
- package/docs/reference/templates/USER.md +28 -0
- package/docs/reference/test.md +239 -0
- package/docs/reference/token-use.md +233 -0
- package/docs/reference/transcript-hygiene.md +214 -0
- package/docs/reference/wizard.md +252 -0
- package/docs/security/CONTRIBUTING-THREAT-MODEL.md +101 -0
- package/docs/security/THREAT-MODEL-ATLAS.md +611 -0
- package/docs/security/formal-verification.md +170 -0
- package/docs/security/incident-response.md +59 -0
- package/docs/security/network-proxy.md +268 -0
- package/docs/snippets/plugin-publish/minimal-openclaw.plugin.json +12 -0
- package/docs/snippets/plugin-publish/minimal-package.json +16 -0
- package/docs/start/bootstrapping.md +49 -0
- package/docs/start/docs-directory.md +69 -0
- package/docs/start/getting-started.md +152 -0
- package/docs/start/hubs.md +201 -0
- package/docs/start/lore.md +223 -0
- package/docs/start/onboarding-overview.md +72 -0
- package/docs/start/onboarding.md +95 -0
- package/docs/start/openclaw.md +244 -0
- package/docs/start/quickstart.md +25 -0
- package/docs/start/setup.md +178 -0
- package/docs/start/showcase.md +383 -0
- package/docs/start/wizard-cli-automation.md +232 -0
- package/docs/start/wizard-cli-reference.md +331 -0
- package/docs/start/wizard.md +141 -0
- package/docs/style.css +184 -0
- package/docs/superpowers/specs/2026-04-22-tweakcn-custom-theme-import-design.md +316 -0
- package/docs/tools/acp-agents-setup.md +352 -0
- package/docs/tools/acp-agents.md +847 -0
- package/docs/tools/agent-send.md +112 -0
- package/docs/tools/apply-patch.md +64 -0
- package/docs/tools/brave-search.md +139 -0
- package/docs/tools/browser-control.md +391 -0
- package/docs/tools/browser-linux-troubleshooting.md +173 -0
- package/docs/tools/browser-login.md +77 -0
- package/docs/tools/browser-wsl2-windows-remote-cdp-troubleshooting.md +219 -0
- package/docs/tools/browser.md +769 -0
- package/docs/tools/btw.md +159 -0
- package/docs/tools/capability-cookbook.md +12 -0
- package/docs/tools/clawhub.md +5 -0
- package/docs/tools/code-execution.md +173 -0
- package/docs/tools/creating-skills.md +120 -0
- package/docs/tools/diffs.md +506 -0
- package/docs/tools/duckduckgo-search.md +109 -0
- package/docs/tools/elevated.md +128 -0
- package/docs/tools/exa-search.md +152 -0
- package/docs/tools/exec-approvals-advanced.md +360 -0
- package/docs/tools/exec-approvals.md +474 -0
- package/docs/tools/exec.md +282 -0
- package/docs/tools/firecrawl.md +155 -0
- package/docs/tools/gemini-search.md +114 -0
- package/docs/tools/grok-search.md +113 -0
- package/docs/tools/image-generation.md +433 -0
- package/docs/tools/index.md +178 -0
- package/docs/tools/kimi-search.md +105 -0
- package/docs/tools/llm-task.md +137 -0
- package/docs/tools/lobster.md +365 -0
- package/docs/tools/loop-detection.md +154 -0
- package/docs/tools/media-overview.md +157 -0
- package/docs/tools/minimax-search.md +102 -0
- package/docs/tools/multi-agent-sandbox-tools.md +409 -0
- package/docs/tools/music-generation.md +371 -0
- package/docs/tools/ollama-search.md +153 -0
- package/docs/tools/pdf.md +195 -0
- package/docs/tools/perplexity-search.md +220 -0
- package/docs/tools/plugin.md +327 -0
- package/docs/tools/reactions.md +100 -0
- package/docs/tools/searxng-search.md +141 -0
- package/docs/tools/skills-config.md +195 -0
- package/docs/tools/skills.md +535 -0
- package/docs/tools/slash-commands.md +488 -0
- package/docs/tools/steer.md +84 -0
- package/docs/tools/subagents.md +650 -0
- package/docs/tools/tavily.md +162 -0
- package/docs/tools/thinking.md +140 -0
- package/docs/tools/tokenjuice.md +81 -0
- package/docs/tools/tool-search.md +269 -0
- package/docs/tools/trajectory.md +229 -0
- package/docs/tools/tts.md +1004 -0
- package/docs/tools/video-generation.md +552 -0
- package/docs/tools/web-fetch.md +195 -0
- package/docs/tools/web.md +459 -0
- package/docs/tts.md +11 -0
- package/docs/vps.md +139 -0
- package/docs/web/control-ui.md +503 -0
- package/docs/web/dashboard.md +107 -0
- package/docs/web/index.md +133 -0
- package/docs/web/tui.md +246 -0
- package/docs/web/webchat.md +99 -0
- package/docs/whatsapp-openclaw-ai-zh.jpg +0 -0
- package/docs/whatsapp-openclaw.jpg +0 -0
- package/nodmix.mjs +487 -0
- package/package.json +1852 -0
- package/patches/.gitkeep +0 -0
- package/patches/@agentclientprotocol__claude-agent-acp@0.36.1.patch +41 -0
- package/pnpm-workspace.yaml +63 -0
- package/scripts/crabbox-wrapper.mjs +353 -0
- package/scripts/lib/official-external-channel-catalog.json +559 -0
- package/scripts/lib/official-external-plugin-catalog.json +192 -0
- package/scripts/lib/official-external-provider-catalog.json +117 -0
- package/scripts/lib/package-dist-imports.mjs +171 -0
- package/scripts/npm-runner.mjs +91 -0
- package/scripts/postinstall-bundled-plugins.mjs +978 -0
- package/scripts/preinstall-package-manager-warning.mjs +64 -0
- package/scripts/windows-cmd-helpers.mjs +20 -0
- package/skills/1password/SKILL.md +70 -0
- package/skills/1password/references/cli-examples.md +29 -0
- package/skills/1password/references/get-started.md +17 -0
- package/skills/apple-notes/SKILL.md +77 -0
- package/skills/apple-reminders/SKILL.md +118 -0
- package/skills/bear-notes/SKILL.md +107 -0
- package/skills/blogwatcher/SKILL.md +69 -0
- package/skills/blucli/SKILL.md +47 -0
- package/skills/camsnap/SKILL.md +45 -0
- package/skills/canvas/SKILL.md +78 -0
- package/skills/clawhub/SKILL.md +77 -0
- package/skills/coding-agent/SKILL.md +149 -0
- package/skills/diagram-maker/SKILL.md +53 -0
- package/skills/diagram-maker/references/excalidraw-patterns.md +85 -0
- package/skills/diagram-maker/references/svg-template.md +112 -0
- package/skills/discord/SKILL.md +136 -0
- package/skills/eightctl/SKILL.md +50 -0
- package/skills/gemini/SKILL.md +47 -0
- package/skills/gh-issues/SKILL.md +213 -0
- package/skills/gifgrep/SKILL.md +85 -0
- package/skills/github/SKILL.md +84 -0
- package/skills/gog/SKILL.md +116 -0
- package/skills/goplaces/SKILL.md +52 -0
- package/skills/healthcheck/SKILL.md +105 -0
- package/skills/himalaya/SKILL.md +80 -0
- package/skills/himalaya/references/configuration.md +184 -0
- package/skills/himalaya/references/message-composition.md +199 -0
- package/skills/imsg/SKILL.md +122 -0
- package/skills/mcporter/SKILL.md +61 -0
- package/skills/meme-maker/SKILL.md +42 -0
- package/skills/meme-maker/references/templates.json +358 -0
- package/skills/meme-maker/scripts/meme.mjs +398 -0
- package/skills/model-usage/SKILL.md +69 -0
- package/skills/model-usage/references/codexbar-cli.md +33 -0
- package/skills/model-usage/scripts/model_usage.py +319 -0
- package/skills/model-usage/scripts/test_model_usage.py +40 -0
- package/skills/nano-pdf/SKILL.md +38 -0
- package/skills/node-connect/SKILL.md +142 -0
- package/skills/node-inspect-debugger/SKILL.md +85 -0
- package/skills/notion/SKILL.md +150 -0
- package/skills/obsidian/SKILL.md +119 -0
- package/skills/openai-whisper/SKILL.md +38 -0
- package/skills/openai-whisper-api/SKILL.md +71 -0
- package/skills/openai-whisper-api/scripts/transcribe.sh +154 -0
- package/skills/openhue/SKILL.md +112 -0
- package/skills/oracle/SKILL.md +126 -0
- package/skills/ordercli/SKILL.md +78 -0
- package/skills/peekaboo/SKILL.md +190 -0
- package/skills/pyproject.toml +10 -0
- package/skills/python-debugpy/SKILL.md +73 -0
- package/skills/sag/SKILL.md +87 -0
- package/skills/session-logs/SKILL.md +151 -0
- package/skills/sherpa-onnx-tts/SKILL.md +109 -0
- package/skills/sherpa-onnx-tts/bin/sherpa-onnx-tts +178 -0
- package/skills/skill-creator/SKILL.md +78 -0
- package/skills/skill-creator/license.txt +202 -0
- package/skills/skill-creator/scripts/init_skill.py +378 -0
- package/skills/skill-creator/scripts/package_skill.py +139 -0
- package/skills/skill-creator/scripts/quick_validate.py +169 -0
- package/skills/skill-creator/scripts/test_package_skill.py +161 -0
- package/skills/skill-creator/scripts/test_quick_validate.py +116 -0
- package/skills/slack/SKILL.md +78 -0
- package/skills/songsee/SKILL.md +49 -0
- package/skills/sonoscli/SKILL.md +65 -0
- package/skills/spike/SKILL.md +51 -0
- package/skills/spotify-player/SKILL.md +64 -0
- package/skills/summarize/SKILL.md +87 -0
- package/skills/taskflow/SKILL.md +149 -0
- package/skills/taskflow/examples/inbox-triage.lobster +33 -0
- package/skills/taskflow/examples/pr-intake.lobster +32 -0
- package/skills/taskflow-inbox-triage/SKILL.md +119 -0
- package/skills/things-mac/SKILL.md +86 -0
- package/skills/tmux/SKILL.md +91 -0
- package/skills/tmux/scripts/find-sessions.sh +112 -0
- package/skills/tmux/scripts/wait-for-text.sh +83 -0
- package/skills/trello/SKILL.md +108 -0
- package/skills/video-frames/SKILL.md +46 -0
- package/skills/video-frames/scripts/frame.sh +81 -0
- package/skills/voice-call/SKILL.md +45 -0
- package/skills/wacli/SKILL.md +72 -0
- package/skills/weather/SKILL.md +64 -0
- package/skills/xurl/SKILL.md +120 -0
|
@@ -0,0 +1,469 @@
|
|
|
1
|
+
---
|
|
2
|
+
summary: "Inbound image/audio/video understanding (optional) with provider + CLI fallbacks"
|
|
3
|
+
read_when:
|
|
4
|
+
- Designing or refactoring media understanding
|
|
5
|
+
- Tuning inbound audio/video/image preprocessing
|
|
6
|
+
title: "Media understanding"
|
|
7
|
+
sidebarTitle: "Media understanding"
|
|
8
|
+
---
|
|
9
|
+
|
|
10
|
+
Nodmix can **summarize inbound media** (image/audio/video) before the reply pipeline runs. It auto-detects when local tools or provider keys are available, and can be disabled or customized. If understanding is off, models still receive the original files/URLs as usual.
|
|
11
|
+
|
|
12
|
+
Vendor-specific media behavior is registered by vendor plugins, while Nodmix core owns the shared `tools.media` config, fallback order, and reply-pipeline integration.
|
|
13
|
+
|
|
14
|
+
## Goals
|
|
15
|
+
|
|
16
|
+
- Optional: pre-digest inbound media into short text for faster routing + better command parsing.
|
|
17
|
+
- Preserve original media delivery to the model (always).
|
|
18
|
+
- Support **provider APIs** and **CLI fallbacks**.
|
|
19
|
+
- Allow multiple models with ordered fallback (error/size/timeout).
|
|
20
|
+
|
|
21
|
+
## High-level behavior
|
|
22
|
+
|
|
23
|
+
<Steps>
|
|
24
|
+
<Step title="Collect attachments">
|
|
25
|
+
Collect inbound attachments (`MediaPaths`, `MediaUrls`, `MediaTypes`).
|
|
26
|
+
</Step>
|
|
27
|
+
<Step title="Select per-capability">
|
|
28
|
+
For each enabled capability (image/audio/video), select attachments per policy (default: **first**).
|
|
29
|
+
</Step>
|
|
30
|
+
<Step title="Choose model">
|
|
31
|
+
Choose the first eligible model entry (size + capability + auth).
|
|
32
|
+
</Step>
|
|
33
|
+
<Step title="Fallback on failure">
|
|
34
|
+
If a model fails or the media is too large, **fall back to the next entry**.
|
|
35
|
+
</Step>
|
|
36
|
+
<Step title="Apply success block">
|
|
37
|
+
On success:
|
|
38
|
+
|
|
39
|
+
- `Body` becomes `[Image]`, `[Audio]`, or `[Video]` block.
|
|
40
|
+
- Audio sets `{{Transcript}}`; command parsing uses caption text when present, otherwise the transcript.
|
|
41
|
+
- Captions are preserved as `User text:` inside the block.
|
|
42
|
+
|
|
43
|
+
</Step>
|
|
44
|
+
</Steps>
|
|
45
|
+
|
|
46
|
+
If understanding fails or is disabled, **the reply flow continues** with the original body + attachments.
|
|
47
|
+
|
|
48
|
+
## Config overview
|
|
49
|
+
|
|
50
|
+
`tools.media` supports **shared models** plus per-capability overrides:
|
|
51
|
+
|
|
52
|
+
<AccordionGroup>
|
|
53
|
+
<Accordion title="Top-level keys">
|
|
54
|
+
- `tools.media.models`: shared model list (use `capabilities` to gate).
|
|
55
|
+
- `tools.media.image` / `tools.media.audio` / `tools.media.video`:
|
|
56
|
+
- defaults (`prompt`, `maxChars`, `maxBytes`, `timeoutSeconds`, `language`)
|
|
57
|
+
- provider overrides (`baseUrl`, `headers`, `providerOptions`)
|
|
58
|
+
- Deepgram audio options via `tools.media.audio.providerOptions.deepgram`
|
|
59
|
+
- audio transcript echo controls (`echoTranscript`, default `false`; `echoFormat`)
|
|
60
|
+
- optional **per-capability `models` list** (preferred before shared models)
|
|
61
|
+
- `attachments` policy (`mode`, `maxAttachments`, `prefer`)
|
|
62
|
+
- `scope` (optional gating by channel/chatType/session key)
|
|
63
|
+
- `tools.media.concurrency`: max concurrent capability runs (default **2**).
|
|
64
|
+
|
|
65
|
+
</Accordion>
|
|
66
|
+
</AccordionGroup>
|
|
67
|
+
|
|
68
|
+
```json5
|
|
69
|
+
{
|
|
70
|
+
tools: {
|
|
71
|
+
media: {
|
|
72
|
+
models: [
|
|
73
|
+
/* shared list */
|
|
74
|
+
],
|
|
75
|
+
image: {
|
|
76
|
+
/* optional overrides */
|
|
77
|
+
},
|
|
78
|
+
audio: {
|
|
79
|
+
/* optional overrides */
|
|
80
|
+
echoTranscript: true,
|
|
81
|
+
echoFormat: '📝 "{transcript}"',
|
|
82
|
+
},
|
|
83
|
+
video: {
|
|
84
|
+
/* optional overrides */
|
|
85
|
+
},
|
|
86
|
+
},
|
|
87
|
+
},
|
|
88
|
+
}
|
|
89
|
+
```
|
|
90
|
+
|
|
91
|
+
### Model entries
|
|
92
|
+
|
|
93
|
+
Each `models[]` entry can be **provider** or **CLI**:
|
|
94
|
+
|
|
95
|
+
<Tabs>
|
|
96
|
+
<Tab title="Provider entry">
|
|
97
|
+
```json5
|
|
98
|
+
{
|
|
99
|
+
type: "provider", // default if omitted
|
|
100
|
+
provider: "openai",
|
|
101
|
+
model: "gpt-5.5",
|
|
102
|
+
prompt: "Describe the image in <= 500 chars.",
|
|
103
|
+
maxChars: 500,
|
|
104
|
+
maxBytes: 10485760,
|
|
105
|
+
timeoutSeconds: 60,
|
|
106
|
+
capabilities: ["image"], // optional, used for multi-modal entries
|
|
107
|
+
profile: "vision-profile",
|
|
108
|
+
preferredProfile: "vision-fallback",
|
|
109
|
+
}
|
|
110
|
+
```
|
|
111
|
+
</Tab>
|
|
112
|
+
<Tab title="CLI entry">
|
|
113
|
+
```json5
|
|
114
|
+
{
|
|
115
|
+
type: "cli",
|
|
116
|
+
command: "gemini",
|
|
117
|
+
args: [
|
|
118
|
+
"-m",
|
|
119
|
+
"gemini-3-flash",
|
|
120
|
+
"--allowed-tools",
|
|
121
|
+
"read_file",
|
|
122
|
+
"Read the media at {{MediaPath}} and describe it in <= {{MaxChars}} characters.",
|
|
123
|
+
],
|
|
124
|
+
maxChars: 500,
|
|
125
|
+
maxBytes: 52428800,
|
|
126
|
+
timeoutSeconds: 120,
|
|
127
|
+
capabilities: ["video", "image"],
|
|
128
|
+
}
|
|
129
|
+
```
|
|
130
|
+
|
|
131
|
+
CLI templates can also use:
|
|
132
|
+
|
|
133
|
+
- `{{MediaDir}}` (directory containing the media file)
|
|
134
|
+
- `{{OutputDir}}` (scratch dir created for this run)
|
|
135
|
+
- `{{OutputBase}}` (scratch file base path, no extension)
|
|
136
|
+
|
|
137
|
+
</Tab>
|
|
138
|
+
</Tabs>
|
|
139
|
+
|
|
140
|
+
## Defaults and limits
|
|
141
|
+
|
|
142
|
+
Recommended defaults:
|
|
143
|
+
|
|
144
|
+
- `maxChars`: **500** for image/video (short, command-friendly)
|
|
145
|
+
- `maxChars`: **unset** for audio (full transcript unless you set a limit)
|
|
146
|
+
- `maxBytes`:
|
|
147
|
+
- image: **10MB**
|
|
148
|
+
- audio: **20MB**
|
|
149
|
+
- video: **50MB**
|
|
150
|
+
|
|
151
|
+
<AccordionGroup>
|
|
152
|
+
<Accordion title="Rules">
|
|
153
|
+
- If media exceeds `maxBytes`, that model is skipped and the **next model is tried**.
|
|
154
|
+
- Audio files smaller than **1024 bytes** are treated as empty/corrupt and skipped before provider/CLI transcription; inbound reply context receives a deterministic placeholder transcript so the agent knows the note was too small.
|
|
155
|
+
- If the model returns more than `maxChars`, output is trimmed.
|
|
156
|
+
- `prompt` defaults to simple "Describe the {media}." plus the `maxChars` guidance (image/video only).
|
|
157
|
+
- If the active primary image model already supports vision natively, Nodmix skips the `[Image]` summary block and passes the original image into the model instead.
|
|
158
|
+
- If a Gateway/WebChat primary model is text-only, image attachments are preserved as offloaded `media://inbound/*` refs so the image/PDF tools or configured image model can still inspect them instead of losing the attachment.
|
|
159
|
+
- Explicit `nodmix infer image describe --model <provider/model>` requests are different: they run that image-capable provider/model directly, including Ollama refs such as `ollama/qwen2.5vl:7b`.
|
|
160
|
+
- If `<capability>.enabled: true` but no models are configured, Nodmix tries the **active reply model** when its provider supports the capability.
|
|
161
|
+
|
|
162
|
+
</Accordion>
|
|
163
|
+
</AccordionGroup>
|
|
164
|
+
|
|
165
|
+
### Auto-detect media understanding (default)
|
|
166
|
+
|
|
167
|
+
If `tools.media.<capability>.enabled` is **not** set to `false` and you haven't configured models, Nodmix auto-detects in this order and **stops at the first working option**:
|
|
168
|
+
|
|
169
|
+
<Steps>
|
|
170
|
+
<Step title="Active reply model">
|
|
171
|
+
Active reply model when its provider supports the capability.
|
|
172
|
+
</Step>
|
|
173
|
+
<Step title="agents.defaults.imageModel">
|
|
174
|
+
`agents.defaults.imageModel` primary/fallback refs (image only).
|
|
175
|
+
Prefer `provider/model` refs. Bare refs are qualified from configured image-capable provider model entries only when the match is unique.
|
|
176
|
+
</Step>
|
|
177
|
+
<Step title="Local CLIs (audio only)">
|
|
178
|
+
Local CLIs (if installed):
|
|
179
|
+
|
|
180
|
+
- `sherpa-onnx-offline` (requires `SHERPA_ONNX_MODEL_DIR` with encoder/decoder/joiner/tokens)
|
|
181
|
+
- `whisper-cli` (`whisper-cpp`; uses `WHISPER_CPP_MODEL` or the bundled tiny model)
|
|
182
|
+
- `whisper` (Python CLI; downloads models automatically)
|
|
183
|
+
|
|
184
|
+
</Step>
|
|
185
|
+
<Step title="Gemini CLI">
|
|
186
|
+
`gemini` using `read_many_files`.
|
|
187
|
+
</Step>
|
|
188
|
+
<Step title="Provider auth">
|
|
189
|
+
- Configured `models.providers.*` entries that support the capability are tried before the bundled fallback order.
|
|
190
|
+
- Image-only config providers with an image-capable model auto-register for media understanding even when they are not a bundled vendor plugin.
|
|
191
|
+
- Ollama image understanding is available when selected explicitly, for example through `agents.defaults.imageModel` or `nodmix infer image describe --model ollama/<vision-model>`.
|
|
192
|
+
|
|
193
|
+
Bundled fallback order:
|
|
194
|
+
|
|
195
|
+
- Audio: OpenAI → Groq → xAI → Deepgram → OpenRouter → Google → SenseAudio → ElevenLabs → Mistral
|
|
196
|
+
- Image: OpenAI → Anthropic → Google → MiniMax → MiniMax Portal → Z.AI
|
|
197
|
+
- Video: Google → Qwen → Moonshot
|
|
198
|
+
|
|
199
|
+
</Step>
|
|
200
|
+
</Steps>
|
|
201
|
+
|
|
202
|
+
To disable auto-detection, set:
|
|
203
|
+
|
|
204
|
+
```json5
|
|
205
|
+
{
|
|
206
|
+
tools: {
|
|
207
|
+
media: {
|
|
208
|
+
audio: {
|
|
209
|
+
enabled: false,
|
|
210
|
+
},
|
|
211
|
+
},
|
|
212
|
+
},
|
|
213
|
+
}
|
|
214
|
+
```
|
|
215
|
+
|
|
216
|
+
<Note>
|
|
217
|
+
Binary detection is best-effort across macOS/Linux/Windows; ensure the CLI is on `PATH` (we expand `~`), or set an explicit CLI model with a full command path.
|
|
218
|
+
</Note>
|
|
219
|
+
|
|
220
|
+
### Proxy environment support (provider models)
|
|
221
|
+
|
|
222
|
+
When provider-based **audio** and **video** media understanding is enabled, Nodmix honors standard outbound proxy environment variables for provider HTTP calls:
|
|
223
|
+
|
|
224
|
+
- `HTTPS_PROXY`
|
|
225
|
+
- `HTTP_PROXY`
|
|
226
|
+
- `ALL_PROXY`
|
|
227
|
+
- `https_proxy`
|
|
228
|
+
- `http_proxy`
|
|
229
|
+
- `all_proxy`
|
|
230
|
+
|
|
231
|
+
If no proxy env vars are set, media understanding uses direct egress. If the proxy value is malformed, Nodmix logs a warning and falls back to direct fetch.
|
|
232
|
+
|
|
233
|
+
## Capabilities (optional)
|
|
234
|
+
|
|
235
|
+
If you set `capabilities`, the entry only runs for those media types. For shared lists, Nodmix can infer defaults:
|
|
236
|
+
|
|
237
|
+
- `openai`, `anthropic`, `minimax`: **image**
|
|
238
|
+
- `minimax-portal`: **image**
|
|
239
|
+
- `moonshot`: **image + video**
|
|
240
|
+
- `openrouter`: **image + audio**
|
|
241
|
+
- `google` (Gemini API): **image + audio + video**
|
|
242
|
+
- `qwen`: **image + video**
|
|
243
|
+
- `mistral`: **audio**
|
|
244
|
+
- `zai`: **image**
|
|
245
|
+
- `groq`: **audio**
|
|
246
|
+
- `xai`: **audio**
|
|
247
|
+
- `deepgram`: **audio**
|
|
248
|
+
- Any `models.providers.<id>.models[]` catalog with an image-capable model: **image**
|
|
249
|
+
|
|
250
|
+
For CLI entries, **set `capabilities` explicitly** to avoid surprising matches. If you omit `capabilities`, the entry is eligible for the list it appears in.
|
|
251
|
+
|
|
252
|
+
## Provider support matrix (Nodmix integrations)
|
|
253
|
+
|
|
254
|
+
| Capability | Provider integration | Notes |
|
|
255
|
+
| ---------- | ---------------------------------------------------------------------------------------------------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
|
|
256
|
+
| Image | OpenAI, OpenAI Codex OAuth, Codex app-server, OpenRouter, Anthropic, Google, MiniMax, Moonshot, Qwen, Z.AI, config providers | Vendor plugins register image support; `openai-codex/*` uses OAuth provider plumbing; `codex/*` uses a bounded Codex app-server turn; MiniMax and MiniMax OAuth both use `MiniMax-VL-01`; image-capable config providers auto-register. |
|
|
257
|
+
| Audio | OpenAI, Groq, xAI, Deepgram, OpenRouter, Google, SenseAudio, ElevenLabs, Mistral | Provider transcription (Whisper/Groq/xAI/Deepgram/OpenRouter STT/Gemini/SenseAudio/Scribe/Voxtral). |
|
|
258
|
+
| Video | Google, Qwen, Moonshot | Provider video understanding via vendor plugins; Qwen video understanding uses the Standard DashScope endpoints. |
|
|
259
|
+
|
|
260
|
+
<Note>
|
|
261
|
+
**MiniMax note**
|
|
262
|
+
|
|
263
|
+
- `minimax`, `minimax-cn`, `minimax-portal`, and `minimax-portal-cn` image understanding comes from the plugin-owned `MiniMax-VL-01` media provider.
|
|
264
|
+
- Automatic image routing keeps using `MiniMax-VL-01` even if legacy MiniMax M2.x chat metadata claims image input.
|
|
265
|
+
|
|
266
|
+
</Note>
|
|
267
|
+
|
|
268
|
+
## Model selection guidance
|
|
269
|
+
|
|
270
|
+
- Prefer the strongest latest-generation model available for each media capability when quality and safety matter.
|
|
271
|
+
- For tool-enabled agents handling untrusted inputs, avoid older/weaker media models.
|
|
272
|
+
- Keep at least one fallback per capability for availability (quality model + faster/cheaper model).
|
|
273
|
+
- CLI fallbacks (`whisper-cli`, `whisper`, `gemini`) are useful when provider APIs are unavailable.
|
|
274
|
+
- `parakeet-mlx` note: with `--output-dir`, Nodmix reads `<output-dir>/<media-basename>.txt` when output format is `txt` (or unspecified); non-`txt` formats fall back to stdout.
|
|
275
|
+
|
|
276
|
+
## Attachment policy
|
|
277
|
+
|
|
278
|
+
Per-capability `attachments` controls which attachments are processed:
|
|
279
|
+
|
|
280
|
+
<ParamField path="mode" type='"first" | "all"' default="first">
|
|
281
|
+
Whether to process the first selected attachment or all of them.
|
|
282
|
+
</ParamField>
|
|
283
|
+
<ParamField path="maxAttachments" type="number" default="1">
|
|
284
|
+
Cap the number processed.
|
|
285
|
+
</ParamField>
|
|
286
|
+
<ParamField path="prefer" type='"first" | "last" | "path" | "url"'>
|
|
287
|
+
Selection preference among candidate attachments.
|
|
288
|
+
</ParamField>
|
|
289
|
+
|
|
290
|
+
When `mode: "all"`, outputs are labeled `[Image 1/2]`, `[Audio 2/2]`, etc.
|
|
291
|
+
|
|
292
|
+
<AccordionGroup>
|
|
293
|
+
<Accordion title="File-attachment extraction behavior">
|
|
294
|
+
- Extracted file text is wrapped as **untrusted external content** before it is appended to the media prompt.
|
|
295
|
+
- The injected block uses explicit boundary markers like `<<<EXTERNAL_UNTRUSTED_CONTENT id="...">>>` / `<<<END_EXTERNAL_UNTRUSTED_CONTENT id="...">>>` and includes a `Source: External` metadata line.
|
|
296
|
+
- This attachment-extraction path intentionally omits the long `SECURITY NOTICE:` banner to avoid bloating the media prompt; the boundary markers and metadata still remain.
|
|
297
|
+
- If a file has no extractable text, Nodmix injects `[No extractable text]`.
|
|
298
|
+
- If a PDF falls back to rendered page images in this path, the media prompt keeps the placeholder `[PDF content rendered to images; images not forwarded to model]` because this attachment-extraction step forwards text blocks, not the rendered PDF images.
|
|
299
|
+
|
|
300
|
+
</Accordion>
|
|
301
|
+
</AccordionGroup>
|
|
302
|
+
|
|
303
|
+
## Config examples
|
|
304
|
+
|
|
305
|
+
<Tabs>
|
|
306
|
+
<Tab title="Shared models + overrides">
|
|
307
|
+
```json5
|
|
308
|
+
{
|
|
309
|
+
tools: {
|
|
310
|
+
media: {
|
|
311
|
+
models: [
|
|
312
|
+
{ provider: "openai", model: "gpt-5.5", capabilities: ["image"] },
|
|
313
|
+
{
|
|
314
|
+
provider: "google",
|
|
315
|
+
model: "gemini-3-flash-preview",
|
|
316
|
+
capabilities: ["image", "audio", "video"],
|
|
317
|
+
},
|
|
318
|
+
{
|
|
319
|
+
type: "cli",
|
|
320
|
+
command: "gemini",
|
|
321
|
+
args: [
|
|
322
|
+
"-m",
|
|
323
|
+
"gemini-3-flash",
|
|
324
|
+
"--allowed-tools",
|
|
325
|
+
"read_file",
|
|
326
|
+
"Read the media at {{MediaPath}} and describe it in <= {{MaxChars}} characters.",
|
|
327
|
+
],
|
|
328
|
+
capabilities: ["image", "video"],
|
|
329
|
+
},
|
|
330
|
+
],
|
|
331
|
+
audio: {
|
|
332
|
+
attachments: { mode: "all", maxAttachments: 2 },
|
|
333
|
+
},
|
|
334
|
+
video: {
|
|
335
|
+
maxChars: 500,
|
|
336
|
+
},
|
|
337
|
+
},
|
|
338
|
+
},
|
|
339
|
+
}
|
|
340
|
+
```
|
|
341
|
+
</Tab>
|
|
342
|
+
<Tab title="Audio + video only">
|
|
343
|
+
```json5
|
|
344
|
+
{
|
|
345
|
+
tools: {
|
|
346
|
+
media: {
|
|
347
|
+
audio: {
|
|
348
|
+
enabled: true,
|
|
349
|
+
models: [
|
|
350
|
+
{ provider: "openai", model: "gpt-4o-mini-transcribe" },
|
|
351
|
+
{
|
|
352
|
+
type: "cli",
|
|
353
|
+
command: "whisper",
|
|
354
|
+
args: ["--model", "base", "{{MediaPath}}"],
|
|
355
|
+
},
|
|
356
|
+
],
|
|
357
|
+
},
|
|
358
|
+
video: {
|
|
359
|
+
enabled: true,
|
|
360
|
+
maxChars: 500,
|
|
361
|
+
models: [
|
|
362
|
+
{ provider: "google", model: "gemini-3-flash-preview" },
|
|
363
|
+
{
|
|
364
|
+
type: "cli",
|
|
365
|
+
command: "gemini",
|
|
366
|
+
args: [
|
|
367
|
+
"-m",
|
|
368
|
+
"gemini-3-flash",
|
|
369
|
+
"--allowed-tools",
|
|
370
|
+
"read_file",
|
|
371
|
+
"Read the media at {{MediaPath}} and describe it in <= {{MaxChars}} characters.",
|
|
372
|
+
],
|
|
373
|
+
},
|
|
374
|
+
],
|
|
375
|
+
},
|
|
376
|
+
},
|
|
377
|
+
},
|
|
378
|
+
}
|
|
379
|
+
```
|
|
380
|
+
</Tab>
|
|
381
|
+
<Tab title="Image-only">
|
|
382
|
+
```json5
|
|
383
|
+
{
|
|
384
|
+
tools: {
|
|
385
|
+
media: {
|
|
386
|
+
image: {
|
|
387
|
+
enabled: true,
|
|
388
|
+
maxBytes: 10485760,
|
|
389
|
+
maxChars: 500,
|
|
390
|
+
models: [
|
|
391
|
+
{ provider: "openai", model: "gpt-5.5" },
|
|
392
|
+
{ provider: "anthropic", model: "claude-opus-4-6" },
|
|
393
|
+
{
|
|
394
|
+
type: "cli",
|
|
395
|
+
command: "gemini",
|
|
396
|
+
args: [
|
|
397
|
+
"-m",
|
|
398
|
+
"gemini-3-flash",
|
|
399
|
+
"--allowed-tools",
|
|
400
|
+
"read_file",
|
|
401
|
+
"Read the media at {{MediaPath}} and describe it in <= {{MaxChars}} characters.",
|
|
402
|
+
],
|
|
403
|
+
},
|
|
404
|
+
],
|
|
405
|
+
},
|
|
406
|
+
},
|
|
407
|
+
},
|
|
408
|
+
}
|
|
409
|
+
```
|
|
410
|
+
</Tab>
|
|
411
|
+
<Tab title="Multi-modal single entry">
|
|
412
|
+
```json5
|
|
413
|
+
{
|
|
414
|
+
tools: {
|
|
415
|
+
media: {
|
|
416
|
+
image: {
|
|
417
|
+
models: [
|
|
418
|
+
{
|
|
419
|
+
provider: "google",
|
|
420
|
+
model: "gemini-3.1-pro-preview",
|
|
421
|
+
capabilities: ["image", "video", "audio"],
|
|
422
|
+
},
|
|
423
|
+
],
|
|
424
|
+
},
|
|
425
|
+
audio: {
|
|
426
|
+
models: [
|
|
427
|
+
{
|
|
428
|
+
provider: "google",
|
|
429
|
+
model: "gemini-3.1-pro-preview",
|
|
430
|
+
capabilities: ["image", "video", "audio"],
|
|
431
|
+
},
|
|
432
|
+
],
|
|
433
|
+
},
|
|
434
|
+
video: {
|
|
435
|
+
models: [
|
|
436
|
+
{
|
|
437
|
+
provider: "google",
|
|
438
|
+
model: "gemini-3.1-pro-preview",
|
|
439
|
+
capabilities: ["image", "video", "audio"],
|
|
440
|
+
},
|
|
441
|
+
],
|
|
442
|
+
},
|
|
443
|
+
},
|
|
444
|
+
},
|
|
445
|
+
}
|
|
446
|
+
```
|
|
447
|
+
</Tab>
|
|
448
|
+
</Tabs>
|
|
449
|
+
|
|
450
|
+
## Status output
|
|
451
|
+
|
|
452
|
+
When media understanding runs, `/status` includes a short summary line:
|
|
453
|
+
|
|
454
|
+
```
|
|
455
|
+
📎 Media: image ok (openai/gpt-5.4) · audio skipped (maxBytes)
|
|
456
|
+
```
|
|
457
|
+
|
|
458
|
+
This shows per-capability outcomes and the chosen provider/model when applicable.
|
|
459
|
+
|
|
460
|
+
## Notes
|
|
461
|
+
|
|
462
|
+
- Understanding is **best-effort**. Errors do not block replies.
|
|
463
|
+
- Attachments are still passed to models even when understanding is disabled.
|
|
464
|
+
- Use `scope` to limit where understanding runs (e.g. only DMs).
|
|
465
|
+
|
|
466
|
+
## Related
|
|
467
|
+
|
|
468
|
+
- [Configuration](/gateway/configuration)
|
|
469
|
+
- [Image & media support](/nodes/images)
|
|
@@ -0,0 +1,154 @@
|
|
|
1
|
+
---
|
|
2
|
+
summary: "Talk mode: continuous speech conversations across local STT/TTS and realtime voice"
|
|
3
|
+
read_when:
|
|
4
|
+
- Implementing Talk mode on macOS/iOS/Android
|
|
5
|
+
- Changing voice/TTS/interrupt behavior
|
|
6
|
+
title: "Talk mode"
|
|
7
|
+
---
|
|
8
|
+
|
|
9
|
+
Talk mode has two runtime shapes:
|
|
10
|
+
|
|
11
|
+
- Native macOS/iOS/Android Talk uses local speech recognition, Gateway chat, and `talk.speak` TTS. Nodes advertise the `talk` capability and declare the `talk.*` commands they support.
|
|
12
|
+
- Browser Talk uses `talk.client.create` for client-owned `webrtc` and `provider-websocket` sessions, or `talk.session.create` for Gateway-owned `gateway-relay` sessions. `managed-room` is reserved for Gateway handoff and walkie-talkie rooms.
|
|
13
|
+
- Android Talk can opt into Gateway-owned realtime relay sessions with `talk.realtime.mode: "realtime"` and `talk.realtime.transport: "gateway-relay"`. Otherwise it stays on native speech recognition, Gateway chat, and `talk.speak`.
|
|
14
|
+
- Transcription-only clients use `talk.session.create({ mode: "transcription", transport: "gateway-relay", brain: "none" })`, then `talk.session.appendAudio`, `talk.session.cancelTurn`, and `talk.session.close` when they need captions or dictation without an assistant voice response.
|
|
15
|
+
|
|
16
|
+
Native Talk is a continuous voice conversation loop:
|
|
17
|
+
|
|
18
|
+
1. Listen for speech
|
|
19
|
+
2. Send transcript to the model through the active session
|
|
20
|
+
3. Wait for the response
|
|
21
|
+
4. Speak it via the configured Talk provider (`talk.speak`)
|
|
22
|
+
|
|
23
|
+
Browser realtime Talk forwards provider tool calls through `talk.client.toolCall`; browser clients do not call `chat.send` directly for realtime consults.
|
|
24
|
+
|
|
25
|
+
Transcription-only Talk emits the same common Talk event envelope as realtime and STT/TTS sessions, but uses `mode: "transcription"` and `brain: "none"`. It is for captions, dictation, and observe-only speech capture; one-shot uploaded voice notes still use the media/audio path.
|
|
26
|
+
|
|
27
|
+
## Behavior (macOS)
|
|
28
|
+
|
|
29
|
+
- **Always-on overlay** while Talk mode is enabled.
|
|
30
|
+
- **Listening → Thinking → Speaking** phase transitions.
|
|
31
|
+
- On a **short pause** (silence window), the current transcript is sent.
|
|
32
|
+
- Replies are **written to WebChat** (same as typing).
|
|
33
|
+
- **Interrupt on speech** (default on): if the user starts talking while the assistant is speaking, we stop playback and note the interruption timestamp for the next prompt.
|
|
34
|
+
|
|
35
|
+
## Voice directives in replies
|
|
36
|
+
|
|
37
|
+
The assistant may prefix its reply with a **single JSON line** to control voice:
|
|
38
|
+
|
|
39
|
+
```json
|
|
40
|
+
{ "voice": "<voice-id>", "once": true }
|
|
41
|
+
```
|
|
42
|
+
|
|
43
|
+
Rules:
|
|
44
|
+
|
|
45
|
+
- First non-empty line only.
|
|
46
|
+
- Unknown keys are ignored.
|
|
47
|
+
- `once: true` applies to the current reply only.
|
|
48
|
+
- Without `once`, the voice becomes the new default for Talk mode.
|
|
49
|
+
- The JSON line is stripped before TTS playback.
|
|
50
|
+
|
|
51
|
+
Supported keys:
|
|
52
|
+
|
|
53
|
+
- `voice` / `voice_id` / `voiceId`
|
|
54
|
+
- `model` / `model_id` / `modelId`
|
|
55
|
+
- `speed`, `rate` (WPM), `stability`, `similarity`, `style`, `speakerBoost`
|
|
56
|
+
- `seed`, `normalize`, `lang`, `output_format`, `latency_tier`
|
|
57
|
+
- `once`
|
|
58
|
+
|
|
59
|
+
## Config (`~/.nodmix/nodmix.json`)
|
|
60
|
+
|
|
61
|
+
```json5
|
|
62
|
+
{
|
|
63
|
+
talk: {
|
|
64
|
+
provider: "elevenlabs",
|
|
65
|
+
providers: {
|
|
66
|
+
elevenlabs: {
|
|
67
|
+
voiceId: "elevenlabs_voice_id",
|
|
68
|
+
modelId: "eleven_v3",
|
|
69
|
+
outputFormat: "mp3_44100_128",
|
|
70
|
+
apiKey: "elevenlabs_api_key",
|
|
71
|
+
},
|
|
72
|
+
mlx: {
|
|
73
|
+
modelId: "mlx-community/Soprano-80M-bf16",
|
|
74
|
+
},
|
|
75
|
+
system: {},
|
|
76
|
+
},
|
|
77
|
+
speechLocale: "ru-RU",
|
|
78
|
+
silenceTimeoutMs: 1500,
|
|
79
|
+
interruptOnSpeech: true,
|
|
80
|
+
realtime: {
|
|
81
|
+
provider: "openai",
|
|
82
|
+
providers: {
|
|
83
|
+
openai: {
|
|
84
|
+
apiKey: "openai_api_key",
|
|
85
|
+
model: "gpt-realtime-2",
|
|
86
|
+
voice: "cedar",
|
|
87
|
+
},
|
|
88
|
+
},
|
|
89
|
+
instructions: "Speak warmly and keep answers brief.",
|
|
90
|
+
mode: "realtime",
|
|
91
|
+
transport: "webrtc",
|
|
92
|
+
brain: "agent-consult",
|
|
93
|
+
},
|
|
94
|
+
},
|
|
95
|
+
}
|
|
96
|
+
```
|
|
97
|
+
|
|
98
|
+
Defaults:
|
|
99
|
+
|
|
100
|
+
- `interruptOnSpeech`: true
|
|
101
|
+
- `silenceTimeoutMs`: when unset, Talk keeps the platform default pause window before sending the transcript (`700 ms on macOS and Android, 900 ms on iOS`)
|
|
102
|
+
- `provider`: selects the active Talk provider. Use `elevenlabs`, `mlx`, or `system` for the macOS-local playback paths.
|
|
103
|
+
- `providers.<provider>.voiceId`: falls back to `ELEVENLABS_VOICE_ID` / `SAG_VOICE_ID` for ElevenLabs (or first ElevenLabs voice when API key is available).
|
|
104
|
+
- `providers.elevenlabs.modelId`: defaults to `eleven_v3` when unset.
|
|
105
|
+
- `providers.mlx.modelId`: defaults to `mlx-community/Soprano-80M-bf16` when unset.
|
|
106
|
+
- `providers.elevenlabs.apiKey`: falls back to `ELEVENLABS_API_KEY` (or gateway shell profile if available).
|
|
107
|
+
- `consultThinkingLevel`: optional thinking level override for the full Nodmix agent run behind realtime `nodmix_agent_consult` calls.
|
|
108
|
+
- `consultFastMode`: optional fast-mode override for realtime `nodmix_agent_consult` calls.
|
|
109
|
+
- `realtime.provider`: selects the active browser/server realtime voice provider. Use `openai` for WebRTC, `google` for provider WebSocket, or a bridge-only provider through Gateway relay.
|
|
110
|
+
- `realtime.providers.<provider>` stores provider-owned realtime config. The browser receives only ephemeral or constrained session credentials, never a standard API key.
|
|
111
|
+
- `realtime.providers.openai.voice`: built-in OpenAI Realtime voice id. Current `gpt-realtime-2` voices are `alloy`, `ash`, `ballad`, `coral`, `echo`, `sage`, `shimmer`, `verse`, `marin`, and `cedar`; `marin` and `cedar` are recommended for best quality.
|
|
112
|
+
- `realtime.transport`: `webrtc` and `provider-websocket` are browser realtime transports. Android uses realtime relay only when this is `gateway-relay`; otherwise Android Talk uses its native STT/TTS loop.
|
|
113
|
+
- `realtime.brain`: `agent-consult` routes realtime tool calls through Gateway policy; `direct-tools` is legacy direct-tool compatibility behavior; `none` is for transcription or external orchestration.
|
|
114
|
+
- `realtime.instructions`: appends provider-facing system instructions to Nodmix's built-in realtime prompt. Use it for voice style and tone; Nodmix keeps the default `nodmix_agent_consult` guidance.
|
|
115
|
+
- `talk.catalog` exposes each provider's valid modes, transports, brain strategies, realtime audio formats, and capability flags so first-party Talk clients can avoid unsupported combinations.
|
|
116
|
+
- Streaming transcription providers are discovered through `talk.catalog.transcription`. The current Gateway relay uses the Voice Call streaming provider config until the dedicated Talk transcription config surface is added.
|
|
117
|
+
- `speechLocale`: optional BCP 47 locale id for on-device Talk speech recognition on iOS/macOS. Leave unset to use the device default.
|
|
118
|
+
- `outputFormat`: defaults to `pcm_44100` on macOS/iOS and `pcm_24000` on Android (set `mp3_*` to force MP3 streaming)
|
|
119
|
+
|
|
120
|
+
## macOS UI
|
|
121
|
+
|
|
122
|
+
- Menu bar toggle: **Talk**
|
|
123
|
+
- Config tab: **Talk Mode** group (voice id + interrupt toggle)
|
|
124
|
+
- Overlay:
|
|
125
|
+
- **Listening**: cloud pulses with mic level
|
|
126
|
+
- **Thinking**: sinking animation
|
|
127
|
+
- **Speaking**: radiating rings
|
|
128
|
+
- Click cloud: stop speaking
|
|
129
|
+
- Click X: exit Talk mode
|
|
130
|
+
|
|
131
|
+
## Android UI
|
|
132
|
+
|
|
133
|
+
- Voice tab toggle: **Talk**
|
|
134
|
+
- Manual **Mic** and **Talk** are mutually exclusive runtime capture modes.
|
|
135
|
+
- Manual Mic stops when the app leaves the foreground or the user leaves the Voice tab.
|
|
136
|
+
- Talk Mode keeps running until toggled off or the Android node disconnects, and uses Android's microphone foreground-service type while active.
|
|
137
|
+
|
|
138
|
+
## Notes
|
|
139
|
+
|
|
140
|
+
- Requires Speech + Microphone permissions.
|
|
141
|
+
- Native Talk uses the active Gateway session and only falls back to history polling when response events are unavailable.
|
|
142
|
+
- Browser realtime Talk uses `talk.client.toolCall` for `nodmix_agent_consult` instead of exposing `chat.send` to provider-owned browser sessions.
|
|
143
|
+
- Transcription-only Talk uses `talk.session.create`, `talk.session.appendAudio`, `talk.session.cancelTurn`, and `talk.session.close`; clients subscribe to `talk.event` for partial/final transcript updates.
|
|
144
|
+
- The gateway resolves Talk playback through `talk.speak` using the active Talk provider. Android falls back to local system TTS only when that RPC is unavailable.
|
|
145
|
+
- macOS local MLX playback uses the bundled `nodmix-mlx-tts` helper when present, or an executable on `PATH`. Set `NODMIX_MLX_TTS_BIN` to point at a custom helper binary during development.
|
|
146
|
+
- `stability` for `eleven_v3` is validated to `0.0`, `0.5`, or `1.0`; other models accept `0..1`.
|
|
147
|
+
- `latency_tier` is validated to `0..4` when set.
|
|
148
|
+
- Android supports `pcm_16000`, `pcm_22050`, `pcm_24000`, and `pcm_44100` output formats for low-latency AudioTrack streaming.
|
|
149
|
+
|
|
150
|
+
## Related
|
|
151
|
+
|
|
152
|
+
- [Voice wake](/nodes/voicewake)
|
|
153
|
+
- [Audio and voice notes](/nodes/audio)
|
|
154
|
+
- [Media understanding](/nodes/media-understanding)
|