PyPI - solana-agent - Versions diffs - 31.2.6__tar.gz → 31.3.0__tar.gz - Mend

solana-agent 31.2.6tar.gz → 31.3.0tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (49) hide show

{solana_agent-31.2.6 → solana_agent-31.3.0}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.3
 Name: solana-agent
-Version: 31.2.6
+Version: 31.3.0
 Summary: AI Agents for Solana
 License: MIT
 Keywords: solana,solana ai,solana agent,ai,ai agent,ai agents
@@ -98,6 +98,7 @@ Smart workflows are as easy as combining your tools and prompts.
 * Simple agent definition using JSON
 * Designed for a multi-agent swarm
 * Fast multi-modal processing of text, audio, and images
+* Dual modality realtime streaming with simultaneous audio and text output
 * Smart workflows that keep flows simple and smart
 * Interact with the Solana blockchain with many useful tools
 * MCP tool usage with first-class support for [Zapier](https://zapier.com/mcp)
@@ -132,7 +133,7 @@ Smart workflows are as easy as combining your tools and prompts.
 **OpenAI**
 * [gpt-4.1](https://platform.openai.com/docs/models/gpt-4.1) (agent & router)
 * [text-embedding-3-large](https://platform.openai.com/docs/models/text-embedding-3-large) (embedding)
-* [gpt-realtime](https://platform.openai.com/docs/models/gpt-realtime) (realtime audio agent)
+* [gpt-realtime](https://platform.openai.com/docs/models/gpt-realtime) (realtime audio agent with dual modality support)
 * [tts-1](https://platform.openai.com/docs/models/tts-1) (audio TTS)
 * [gpt-4o-mini-transcribe](https://platform.openai.com/docs/models/gpt-4o-mini-transcribe) (audio transcription)
@@ -281,6 +282,7 @@ async for response in solana_agent.process("user123", "What is the latest news o
 ### Audio/Text Streaming
 ```python
+## Realtime Usage
 from solana_agent import SolanaAgent
 config = {
@@ -311,28 +313,32 @@ async for response in solana_agent.process("user123", audio_content, audio_input
 ### Realtime Audio Streaming
-If input and/or output is encoded (compressed) like mp4/aac then you must have `ffmpeg` installed.
+If input and/or output is encoded (compressed) like mp4/mp3 then you must have `ffmpeg` installed.
 Due to the overhead of the router (API call) - realtime only supports a single agent setup.
 Realtime uses MongoDB for memory so Zep is not needed.
+By default, when `realtime=True` and you supply raw/encoded audio bytes as input, the system **always skips the HTTP transcription (STT) path** and relies solely on the realtime websocket session for input transcription. If you don't specify `rt_transcription_model`, a sensible default (`gpt-4o-mini-transcribe`) is auto-selected so you still receive input transcript events with minimal latency.
+Implications:
+- `llm_provider.transcribe_audio` is never invoked for realtime turns.
+- Lower end-to-end latency (no duplicate network round trip for STT).
+- Unified transcript sourcing from realtime events.
+- If you explicitly want to disable transcription altogether, send text (not audio bytes) or ignore transcript events client-side.
 This example will work using expo-audio on Android and iOS.
 ```python
 from solana_agent import SolanaAgent
 solana_agent = SolanaAgent(config=config)
-audio_content = await audio_file.read()
-async def generate():
-    async for chunk in solana_agent.process(
-        user_id=user_id,
+        user_id="user123",
         message=audio_content,
         realtime=True,
         rt_encode_input=True,
         rt_encode_output=True,
+        rt_output_modalities=["audio"],
         rt_voice="marin",
         output_format="audio",
         audio_output_format="mp3",
@@ -350,6 +356,106 @@ return StreamingResponse(
         "X-Accel-Buffering": "no",
     },
 )
+```
+### Realtime Text Streaming
+Due to the overhead of the router (API call) - realtime only supports a single agent setup.
+Realtime uses MongoDB for memory so Zep is not needed.
+When using realtime with text input, no audio transcription is needed. The same bypass rules apply—HTTP STT is never called in realtime mode.
+```python
+from solana_agent import SolanaAgent
+solana_agent = SolanaAgent(config=config)
+async def generate():
+    async for chunk in solana_agent.process(
+        user_id="user123",
+        message="What is the latest news on Solana?",
+        realtime=True,
+        rt_output_modalities=["text"],
+    ):
+        yield chunk
+```
+### Dual Modality Realtime Streaming
+Solana Agent supports **dual modality realtime streaming**, allowing you to stream both audio and text simultaneously from a single realtime session. This enables rich conversational experiences where users can receive both voice responses and text transcripts in real-time.
+#### Features
+- **Simultaneous Audio & Text**: Stream both modalities from the same conversation
+- **Flexible Output**: Choose audio-only, text-only, or both modalities
+- **Real-time Demuxing**: Automatically separate audio and text streams
+- **Mobile Optimized**: Works seamlessly with compressed audio formats (MP4/AAC)
+- **Memory Efficient**: Smart buffering and streaming for optimal performance
+#### Mobile App Integration Example
+```python
+from fastapi import UploadFile
+from fastapi.responses import StreamingResponse
+from solana_agent import SolanaAgent
+from solana_agent.interfaces.providers.realtime import RealtimeChunk
+import base64
+solana_agent = SolanaAgent(config=config)
+@app.post("/realtime/dual")
+async def realtime_dual_endpoint(audio_file: UploadFile):
+    """
+    Dual modality (audio + text) realtime endpoint using Server-Sent Events (SSE).
+    Emits:
+      event: audio      (base64 encoded audio frames)
+      event: transcript (incremental text)
+    Notes:
+      - Do NOT set output_format when using both modalities.
+      - If only one modality is requested, plain str (text) or raw audio bytes may be yielded instead of RealtimeChunk.
+    """
+    audio_content = await audio_file.read()
+    async def event_stream():
+        async for chunk in solana_agent.process(
+            user_id="mobile_user",
+            message=audio_content,
+            realtime=True,
+            rt_encode_input=True,
+            rt_encode_output=True,
+            rt_output_modalities=["audio", "text"],
+            rt_voice="marin",
+            audio_input_format="mp4",
+            audio_output_format="mp3",
+            # Optionally lock transcription model (otherwise default is auto-selected):
+            # rt_transcription_model="gpt-4o-mini-transcribe",
+        ):
+            if isinstance(chunk, RealtimeChunk):
+                if chunk.is_audio and chunk.audio_data:
+                    b64 = base64.b64encode(chunk.audio_data).decode("ascii")
+                    yield f"event: audio\ndata: {b64}\n\n"
+                elif chunk.is_text and chunk.text_data:
+                    # Incremental transcript (not duplicated at finalize)
+                    yield f"event: transcript\ndata: {chunk.text_data}\n\n"
+                continue
+            # (Defensive) fallback: if something else appears
+            if isinstance(chunk, bytes):
+                b64 = base64.b64encode(chunk).decode("ascii")
+                yield f"event: audio\ndata: {b64}\n\n"
+            elif isinstance(chunk, str):
+                yield f"event: transcript\ndata: {chunk}\n\n"
+        yield "event: done\ndata: end\n\n"
+    return StreamingResponse(
+        event_stream(),
+        media_type="text/event-stream",
+        headers={
+            "Cache-Control": "no-store",
+            "Access-Control-Allow-Origin": "*",
+        },
+    )
+```
 ### Image/Text Streaming

{solana_agent-31.2.6 → solana_agent-31.3.0}/README.md RENAMED Viewed

@@ -62,6 +62,7 @@ Smart workflows are as easy as combining your tools and prompts.
 * Simple agent definition using JSON
 * Designed for a multi-agent swarm
 * Fast multi-modal processing of text, audio, and images
+* Dual modality realtime streaming with simultaneous audio and text output
 * Smart workflows that keep flows simple and smart
 * Interact with the Solana blockchain with many useful tools
 * MCP tool usage with first-class support for [Zapier](https://zapier.com/mcp)
@@ -96,7 +97,7 @@ Smart workflows are as easy as combining your tools and prompts.
 **OpenAI**
 * [gpt-4.1](https://platform.openai.com/docs/models/gpt-4.1) (agent & router)
 * [text-embedding-3-large](https://platform.openai.com/docs/models/text-embedding-3-large) (embedding)
-* [gpt-realtime](https://platform.openai.com/docs/models/gpt-realtime) (realtime audio agent)
+* [gpt-realtime](https://platform.openai.com/docs/models/gpt-realtime) (realtime audio agent with dual modality support)
 * [tts-1](https://platform.openai.com/docs/models/tts-1) (audio TTS)
 * [gpt-4o-mini-transcribe](https://platform.openai.com/docs/models/gpt-4o-mini-transcribe) (audio transcription)
@@ -245,6 +246,7 @@ async for response in solana_agent.process("user123", "What is the latest news o
 ### Audio/Text Streaming
 ```python
+## Realtime Usage
 from solana_agent import SolanaAgent
 config = {
@@ -275,28 +277,32 @@ async for response in solana_agent.process("user123", audio_content, audio_input
 ### Realtime Audio Streaming
-If input and/or output is encoded (compressed) like mp4/aac then you must have `ffmpeg` installed.
+If input and/or output is encoded (compressed) like mp4/mp3 then you must have `ffmpeg` installed.
 Due to the overhead of the router (API call) - realtime only supports a single agent setup.
 Realtime uses MongoDB for memory so Zep is not needed.
+By default, when `realtime=True` and you supply raw/encoded audio bytes as input, the system **always skips the HTTP transcription (STT) path** and relies solely on the realtime websocket session for input transcription. If you don't specify `rt_transcription_model`, a sensible default (`gpt-4o-mini-transcribe`) is auto-selected so you still receive input transcript events with minimal latency.
+Implications:
+- `llm_provider.transcribe_audio` is never invoked for realtime turns.
+- Lower end-to-end latency (no duplicate network round trip for STT).
+- Unified transcript sourcing from realtime events.
+- If you explicitly want to disable transcription altogether, send text (not audio bytes) or ignore transcript events client-side.
 This example will work using expo-audio on Android and iOS.
 ```python
 from solana_agent import SolanaAgent
 solana_agent = SolanaAgent(config=config)
-audio_content = await audio_file.read()
-async def generate():
-    async for chunk in solana_agent.process(
-        user_id=user_id,
+        user_id="user123",
         message=audio_content,
         realtime=True,
         rt_encode_input=True,
         rt_encode_output=True,
+        rt_output_modalities=["audio"],
         rt_voice="marin",
         output_format="audio",
         audio_output_format="mp3",
@@ -314,6 +320,106 @@ return StreamingResponse(
         "X-Accel-Buffering": "no",
     },
 )
+```
+### Realtime Text Streaming
+Due to the overhead of the router (API call) - realtime only supports a single agent setup.
+Realtime uses MongoDB for memory so Zep is not needed.
+When using realtime with text input, no audio transcription is needed. The same bypass rules apply—HTTP STT is never called in realtime mode.
+```python
+from solana_agent import SolanaAgent
+solana_agent = SolanaAgent(config=config)
+async def generate():
+    async for chunk in solana_agent.process(
+        user_id="user123",
+        message="What is the latest news on Solana?",
+        realtime=True,
+        rt_output_modalities=["text"],
+    ):
+        yield chunk
+```
+### Dual Modality Realtime Streaming
+Solana Agent supports **dual modality realtime streaming**, allowing you to stream both audio and text simultaneously from a single realtime session. This enables rich conversational experiences where users can receive both voice responses and text transcripts in real-time.
+#### Features
+- **Simultaneous Audio & Text**: Stream both modalities from the same conversation
+- **Flexible Output**: Choose audio-only, text-only, or both modalities
+- **Real-time Demuxing**: Automatically separate audio and text streams
+- **Mobile Optimized**: Works seamlessly with compressed audio formats (MP4/AAC)
+- **Memory Efficient**: Smart buffering and streaming for optimal performance
+#### Mobile App Integration Example
+```python
+from fastapi import UploadFile
+from fastapi.responses import StreamingResponse
+from solana_agent import SolanaAgent
+from solana_agent.interfaces.providers.realtime import RealtimeChunk
+import base64
+solana_agent = SolanaAgent(config=config)
+@app.post("/realtime/dual")
+async def realtime_dual_endpoint(audio_file: UploadFile):
+    """
+    Dual modality (audio + text) realtime endpoint using Server-Sent Events (SSE).
+    Emits:
+      event: audio      (base64 encoded audio frames)
+      event: transcript (incremental text)
+    Notes:
+      - Do NOT set output_format when using both modalities.
+      - If only one modality is requested, plain str (text) or raw audio bytes may be yielded instead of RealtimeChunk.
+    """
+    audio_content = await audio_file.read()
+    async def event_stream():
+        async for chunk in solana_agent.process(
+            user_id="mobile_user",
+            message=audio_content,
+            realtime=True,
+            rt_encode_input=True,
+            rt_encode_output=True,
+            rt_output_modalities=["audio", "text"],
+            rt_voice="marin",
+            audio_input_format="mp4",
+            audio_output_format="mp3",
+            # Optionally lock transcription model (otherwise default is auto-selected):
+            # rt_transcription_model="gpt-4o-mini-transcribe",
+        ):
+            if isinstance(chunk, RealtimeChunk):
+                if chunk.is_audio and chunk.audio_data:
+                    b64 = base64.b64encode(chunk.audio_data).decode("ascii")
+                    yield f"event: audio\ndata: {b64}\n\n"
+                elif chunk.is_text and chunk.text_data:
+                    # Incremental transcript (not duplicated at finalize)
+                    yield f"event: transcript\ndata: {chunk.text_data}\n\n"
+                continue
+            # (Defensive) fallback: if something else appears
+            if isinstance(chunk, bytes):
+                b64 = base64.b64encode(chunk).decode("ascii")
+                yield f"event: audio\ndata: {b64}\n\n"
+            elif isinstance(chunk, str):
+                yield f"event: transcript\ndata: {chunk}\n\n"
+        yield "event: done\ndata: end\n\n"
+    return StreamingResponse(
+        event_stream(),
+        media_type="text/event-stream",
+        headers={
+            "Cache-Control": "no-store",
+            "Access-Control-Allow-Origin": "*",
+        },
+    )
+```
 ### Image/Text Streaming

{solana_agent-31.2.6 → solana_agent-31.3.0}/pyproject.toml RENAMED Viewed

@@ -1,6 +1,6 @@
 [tool.poetry]
 name = "solana-agent"
-version = "31.2.6"
+version = "31.3.0"
 description = "AI Agents for Solana"
 authors = ["Bevan Hunt <bevan@bevanhunt.com>"]
 license = "MIT"

{solana_agent-31.2.6 → solana_agent-31.3.0}/solana_agent/adapters/openai_realtime_ws.py RENAMED Viewed

@@ -102,16 +102,30 @@ class OpenAIRealtimeWebSocketSession(BaseRealtimeSession):
         ]
         model = self.options.model or "gpt-realtime"
         uri = f"{self.url}?model={model}"
-        logger.info(
-            "Realtime WS connecting: uri=%s, input=%s@%sHz, output=%s@%sHz, voice=%s, vad=%s",
-            uri,
-            self.options.input_mime,
-            self.options.input_rate_hz,
-            self.options.output_mime,
-            self.options.output_rate_hz,
-            self.options.voice,
-            self.options.vad_enabled,
-        )
+        # Determine if audio output should be configured for logging
+        modalities = self.options.output_modalities or ["audio", "text"]
+        should_configure_audio_output = "audio" in modalities
+        if should_configure_audio_output:
+            logger.info(
+                "Realtime WS connecting: uri=%s, input=%s@%sHz, output=%s@%sHz, voice=%s, vad=%s",
+                uri,
+                self.options.input_mime,
+                self.options.input_rate_hz,
+                self.options.output_mime,
+                self.options.output_rate_hz,
+                self.options.voice,
+                self.options.vad_enabled,
+            )
+        else:
+            logger.info(
+                "Realtime WS connecting: uri=%s, input=%s@%sHz, text-only output, vad=%s",
+                uri,
+                self.options.input_mime,
+                self.options.input_rate_hz,
+                self.options.vad_enabled,
+            )
         self._ws = await websockets.connect(
             uri, additional_headers=headers, max_size=None
         )
@@ -165,11 +179,16 @@ class OpenAIRealtimeWebSocketSession(BaseRealtimeSession):
                     cleaned.append(t)
             return cleaned
+        # Determine if audio output should be configured
+        modalities = self.options.output_modalities or ["audio", "text"]
+        should_configure_audio_output = "audio" in modalities
+        # Build session.update per docs (nested audio object)
         session_payload: Dict[str, Any] = {
             "type": "session.update",
             "session": {
                 "type": "realtime",
-                "output_modalities": ["audio"],
+                "output_modalities": modalities,
                 "audio": {
                     "input": {
                         "format": {
@@ -178,16 +197,22 @@ class OpenAIRealtimeWebSocketSession(BaseRealtimeSession):
                         },
                         "turn_detection": td_input,
                     },
-                    "output": {
-                        "format": {
-                            "type": self.options.output_mime or "audio/pcm",
-                            "rate": int(self.options.output_rate_hz or 24000),
-                        },
-                        "voice": self.options.voice,
-                        "speed": float(
-                            getattr(self.options, "voice_speed", 1.0) or 1.0
-                        ),
-                    },
+                    **(
+                        {
+                            "output": {
+                                "format": {
+                                    "type": self.options.output_mime or "audio/pcm",
+                                    "rate": int(self.options.output_rate_hz or 24000),
+                                },
+                                "voice": self.options.voice,
+                                "speed": float(
+                                    getattr(self.options, "voice_speed", 1.0) or 1.0
+                                ),
+                            }
+                        }
+                        if should_configure_audio_output
+                        else {}
+                    ),
                 },
                 # Note: no top-level turn_detection; nested under audio.input
                 **({"prompt": prompt_block} if prompt_block else {}),
@@ -204,13 +229,45 @@ class OpenAIRealtimeWebSocketSession(BaseRealtimeSession):
                 ),
             },
         }
-        logger.info(
-            "Realtime WS: sending session.update (voice=%s, vad=%s, output=%s@%s)",
-            self.options.voice,
-            self.options.vad_enabled,
-            (self.options.output_mime or "audio/pcm"),
-            int(self.options.output_rate_hz or 24000),
-        )
+        # Optional realtime transcription configuration
+        try:
+            tr_model = getattr(self.options, "transcription_model", None)
+            if tr_model:
+                audio_obj = session_payload["session"].setdefault("audio", {})
+                # Attach input transcription config per GA schema
+                transcription_cfg: Dict[str, Any] = {"model": tr_model}
+                lang = getattr(self.options, "transcription_language", None)
+                if lang:
+                    transcription_cfg["language"] = lang
+                prompt_txt = getattr(self.options, "transcription_prompt", None)
+                if prompt_txt is not None:
+                    transcription_cfg["prompt"] = prompt_txt
+                if getattr(self.options, "transcription_include_logprobs", False):
+                    session_payload["session"].setdefault("include", []).append(
+                        "item.input_audio_transcription.logprobs"
+                    )
+                nr = getattr(self.options, "transcription_noise_reduction", None)
+                if nr is not None:
+                    audio_obj["noise_reduction"] = bool(nr)
+                # Place under audio.input.transcription per current server conventions
+                audio_obj.setdefault("input", {}).setdefault(
+                    "transcription", transcription_cfg
+                )
+        except Exception:
+            logger.exception("Failed to attach transcription config to session.update")
+        if should_configure_audio_output:
+            logger.info(
+                "Realtime WS: sending session.update (voice=%s, vad=%s, output=%s@%s)",
+                self.options.voice,
+                self.options.vad_enabled,
+                (self.options.output_mime or "audio/pcm"),
+                int(self.options.output_rate_hz or 24000),
+            )
+        else:
+            logger.info(
+                "Realtime WS: sending session.update (text-only, vad=%s)",
+                self.options.vad_enabled,
+            )
         # Log exact session.update payload and mark awaiting session.updated
         try:
             logger.info(
@@ -231,7 +288,7 @@ class OpenAIRealtimeWebSocketSession(BaseRealtimeSession):
                 logger.warning(
                     "Realtime WS: instructions missing/empty in session.update"
                 )
-            if not voice:
+            if not voice and should_configure_audio_output:
                 logger.warning("Realtime WS: voice missing in session.update")
         except Exception:
             pass
@@ -632,6 +689,20 @@ class OpenAIRealtimeWebSocketSession(BaseRealtimeSession):
                                             len(final),
                                         )
                                     self._out_text_buffers.pop(rid, None)
+                                # Always terminate the output transcript stream for this response when text-only.
+                                try:
+                                    # Only enqueue sentinel when no audio modality is configured
+                                    modalities = (
+                                        getattr(self.options, "output_modalities", None)
+                                        or []
+                                    )
+                                    if "audio" not in modalities:
+                                        self._out_tr_queue.put_nowait(None)
+                                        logger.debug(
+                                            "Enqueued transcript termination sentinel (text-only response)"
+                                        )
+                                except Exception:
+                                    pass
                             except Exception:
                                 pass
                     elif (
@@ -1033,6 +1104,47 @@ class OpenAIRealtimeWebSocketSession(BaseRealtimeSession):
                 else:
                     patch[k] = raw[k]
+        # --- Inject realtime transcription config if options were updated after initial connect ---
+        try:
+            tr_model = getattr(self.options, "transcription_model", None)
+            if tr_model and isinstance(patch, dict):
+                # Ensure audio/input containers exist without overwriting caller provided fields
+                aud = patch.setdefault("audio", {})
+                inp = aud.setdefault("input", {})
+                # Only add if not explicitly provided in this patch
+                if "transcription" not in inp:
+                    transcription_cfg: Dict[str, Any] = {"model": tr_model}
+                    lang = getattr(self.options, "transcription_language", None)
+                    if lang:
+                        transcription_cfg["language"] = lang
+                    prompt_txt = getattr(self.options, "transcription_prompt", None)
+                    if prompt_txt is not None:
+                        transcription_cfg["prompt"] = prompt_txt
+                    nr = getattr(self.options, "transcription_noise_reduction", None)
+                    if nr is not None:
+                        aud["noise_reduction"] = bool(nr)
+                    if getattr(self.options, "transcription_include_logprobs", False):
+                        patch.setdefault("include", [])
+                        if (
+                            "item.input_audio_transcription.logprobs"
+                            not in patch["include"]
+                        ):
+                            patch["include"].append(
+                                "item.input_audio_transcription.logprobs"
+                            )
+                    inp["transcription"] = transcription_cfg
+                    try:
+                        logger.debug(
+                            "Realtime WS: update_session injected transcription config model=%s",
+                            tr_model,
+                        )
+                    except Exception:
+                        pass
+        except Exception:
+            logger.exception(
+                "Realtime WS: failed injecting transcription config in update_session"
+            )
         # Ensure tools are cleaned even if provided only under audio or elsewhere
         if "tools" in patch:
             patch["tools"] = _strip_tool_strict(patch["tools"])  # idempotent
@@ -1040,9 +1152,12 @@ class OpenAIRealtimeWebSocketSession(BaseRealtimeSession):
         # Per server requirements, always include session.type and output_modalities
         try:
             patch["type"] = "realtime"
-            # Preserve caller-provided output_modalities if present, otherwise default to audio
+            # Preserve caller-provided output_modalities if present, otherwise default to configured modalities
             if "output_modalities" not in patch:
-                patch["output_modalities"] = ["audio"]
+                patch["output_modalities"] = self.options.output_modalities or [
+                    "audio",
+                    "text",
+                ]
         except Exception:
             pass
@@ -1148,6 +1263,13 @@ class OpenAIRealtimeWebSocketSession(BaseRealtimeSession):
         except Exception:
             pass
+    async def create_conversation_item(
+        self, item: Dict[str, Any]
+    ) -> None:  # pragma: no cover
+        """Create a conversation item (e.g., for text input)."""
+        payload = {"type": "conversation.item.create", "item": item}
+        await self._send_tracked(payload, label="conversation.item.create")
     async def create_response(
         self, response_patch: Optional[Dict[str, Any]] = None
     ) -> None:  # pragma: no cover
@@ -1639,6 +1761,13 @@ class OpenAITranscriptionWebSocketSession(BaseRealtimeSession):
     async def clear_input(self) -> None:  # pragma: no cover
         await self._send({"type": "input_audio_buffer.clear"})
+    async def create_conversation_item(
+        self, item: Dict[str, Any]
+    ) -> None:  # pragma: no cover
+        """Create a conversation item (e.g., for text input)."""
+        payload = {"type": "conversation.item.create", "item": item}
+        await self._send_tracked(payload, label="conversation.item.create")
     async def create_response(
         self, response_patch: Optional[Dict[str, Any]] = None
     ) -> None:  # pragma: no cover

{solana_agent-31.2.6 → solana_agent-31.3.0}/solana_agent/client/solana_agent.py RENAMED Viewed

@@ -16,6 +16,7 @@ from solana_agent.interfaces.client.client import SolanaAgent as SolanaAgentInte
 from solana_agent.interfaces.plugins.plugins import Tool
 from solana_agent.services.knowledge_base import KnowledgeBaseService
 from solana_agent.interfaces.services.routing import RoutingService as RoutingInterface
+from solana_agent.interfaces.providers.realtime import RealtimeChunk
 class SolanaAgent(SolanaAgentInterface):
@@ -57,6 +58,7 @@ class SolanaAgent(SolanaAgentInterface):
         vad: Optional[bool] = False,
         rt_encode_input: bool = False,
         rt_encode_output: bool = False,
+        rt_output_modalities: Optional[List[Literal["audio", "text"]]] = None,
         rt_voice: Literal[
             "alloy",
             "ash",
@@ -90,7 +92,9 @@ class SolanaAgent(SolanaAgentInterface):
         router: Optional[RoutingInterface] = None,
         images: Optional[List[Union[str, bytes]]] = None,
         output_model: Optional[Type[BaseModel]] = None,
-    ) -> AsyncGenerator[Union[str, bytes, BaseModel], None]:  # pragma: no cover
+    ) -> AsyncGenerator[
+        Union[str, bytes, BaseModel, RealtimeChunk], None
+    ]:  # pragma: no cover
         """Process a user message (text or audio) and optional images, returning the response stream.
         Args:
@@ -104,6 +108,7 @@ class SolanaAgent(SolanaAgentInterface):
             vad: Whether to use voice activity detection (for audio input)
             rt_encode_input: Whether to re-encode input audio for compatibility
             rt_encode_output: Whether to re-encode output audio for compatibility
+            rt_output_modalities: Modalities to return in realtime (default both if None)
             rt_voice: Voice to use for realtime audio output
             audio_voice: Voice to use for audio output
             audio_output_format: Audio output format
@@ -124,6 +129,7 @@ class SolanaAgent(SolanaAgentInterface):
             vad=vad,
             rt_encode_input=rt_encode_input,
             rt_encode_output=rt_encode_output,
+            rt_output_modalities=rt_output_modalities,
             rt_voice=rt_voice,
             audio_voice=audio_voice,
             audio_output_format=audio_output_format,

solana-agent 31.2.6__tar.gz → 31.3.0__tar.gz

solana-agent 31.2.6tar.gz → 31.3.0tar.gz