PyPI - vision-agents-plugins-deepgram - Versions diffs - 0.1.5__py3-none-any.whl → 0.1.6__py3-none-any.whl - Mend

vision-agents-plugins-deepgram 0.1.5py3-none-any.whl → 0.1.6py3-none-any.whl

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Potentially problematic release.

This version of vision-agents-plugins-deepgram might be problematic. Click here for more details.

Files changed (9) hide show

PKG-INFO +15 -11
README.md +13 -9
pyproject.toml +1 -1
vision_agents/plugins/deepgram/stt.py +229 -156
vision_agents/plugins/deepgram/utils.py +18 -0
{vision_agents_plugins_deepgram-0.1.5.dist-info → vision_agents_plugins_deepgram-0.1.6.dist-info}/METADATA +15 -11
vision_agents_plugins_deepgram-0.1.6.dist-info/RECORD +13 -0
vision_agents_plugins_deepgram-0.1.5.dist-info/RECORD +0 -11
{vision_agents_plugins_deepgram-0.1.5.dist-info → vision_agents_plugins_deepgram-0.1.6.dist-info}/WHEEL +0 -0

PKG-INFO CHANGED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: vision-agents-plugins-deepgram
-Version: 0.1.5
+Version: 0.1.6
 Summary: Deepgram STT integration for Vision Agents
 Project-URL: Documentation, https://visionagents.ai/
 Project-URL: Website, https://visionagents.ai/
@@ -8,31 +8,32 @@ Project-URL: Source, https://github.com/GetStream/Vision-Agents
 License-Expression: MIT
 Keywords: AI,STT,agents,deepgram,speech-to-text,transcription,voice agents
 Requires-Python: >=3.10
-Requires-Dist: deepgram-sdk==4.8.1
+Requires-Dist: deepgram-sdk<5.1,>=5.0.0
 Requires-Dist: numpy<2.3,>=2.2.6
 Requires-Dist: vision-agents
 Description-Content-Type: text/markdown
 # Deepgram Speech-to-Text Plugin
-A high-quality Speech-to-Text (STT) plugin for GetStream that uses the Deepgram API.
+A high-quality Speech-to-Text (STT) plugin for Vision agents that uses the Deepgram API.
 ## Installation
 ```bash
-pip install getstream-plugins-deepgram
+uv add vision-agents-plugins-deepgram
 ```
 ## Usage
 ```python
-from getstream.plugins.deepgram import DeepgramSTT
+from vision_agents.plugins import deepgram
+from getstream.video.rtc.track_util import PcmData
 # Initialize with API key from environment variable
-stt = DeepgramSTT()
+stt = deepgram.STT()
 # Or specify API key directly
-stt = DeepgramSTT(api_key="your_deepgram_api_key")
+stt = deepgram.STT(api_key="your_deepgram_api_key")
 # Register event handlers
 @stt.on("transcript")
@@ -44,6 +45,7 @@ def on_partial(text, user, metadata):
     print(f"Partial transcript from {user}: {text}")
 # Process audio
+pcm_data = PcmData(samples=b"\x00\x00" * 1000, sample_rate=48000, format="s16")
 await stt.process_audio(pcm_data)
 # When done
@@ -52,14 +54,16 @@ await stt.close()
 ## Configuration Options
-- `api_key`: Deepgram API key (default: reads from DEEPGRAM_API_KEY environment variable)
-- `options`: Deepgram LiveOptions for configuring the transcription
+- `api_key`: Deepgram API key (default: reads from `DEEPGRAM_API_KEY` environment variable)
+- `options`: Deepgram options for configuring the transcription.
+See the [Deepgram Listen V1 Connect API documentation](https://github.com/deepgram/deepgram-python-sdk/blob/main/websockets-reference.md#%EF%B8%8F-parameters) for more details.
 - `sample_rate`: Sample rate of the audio in Hz (default: 16000)
 - `language`: Language code for transcription (default: "en-US")
-- `keep_alive_interval`: Interval in seconds to send keep-alive messages (default: 5.0)
+- `keep_alive_interval`: Interval in seconds to send keep-alive messages (default: 1.0s)
+- `connection_timeout`: Timeout to wait for the Deepgram connection to be established before skipping the  in seconds to send keep-alive messages (default: 15.0s)
 ## Requirements
 - Python 3.10+
-- deepgram-sdk>=4.5.0
+- deepgram-sdk>=5.0.0,<5.1
 - numpy>=2.2.6,<2.3

README.md CHANGED Viewed

@@ -1,23 +1,24 @@
 # Deepgram Speech-to-Text Plugin
-A high-quality Speech-to-Text (STT) plugin for GetStream that uses the Deepgram API.
+A high-quality Speech-to-Text (STT) plugin for Vision agents that uses the Deepgram API.
 ## Installation
 ```bash
-pip install getstream-plugins-deepgram
+uv add vision-agents-plugins-deepgram
 ```
 ## Usage
 ```python
-from getstream.plugins.deepgram import DeepgramSTT
+from vision_agents.plugins import deepgram
+from getstream.video.rtc.track_util import PcmData
 # Initialize with API key from environment variable
-stt = DeepgramSTT()
+stt = deepgram.STT()
 # Or specify API key directly
-stt = DeepgramSTT(api_key="your_deepgram_api_key")
+stt = deepgram.STT(api_key="your_deepgram_api_key")
 # Register event handlers
 @stt.on("transcript")
@@ -29,6 +30,7 @@ def on_partial(text, user, metadata):
     print(f"Partial transcript from {user}: {text}")
 # Process audio
+pcm_data = PcmData(samples=b"\x00\x00" * 1000, sample_rate=48000, format="s16")
 await stt.process_audio(pcm_data)
 # When done
@@ -37,14 +39,16 @@ await stt.close()
 ## Configuration Options
-- `api_key`: Deepgram API key (default: reads from DEEPGRAM_API_KEY environment variable)
-- `options`: Deepgram LiveOptions for configuring the transcription
+- `api_key`: Deepgram API key (default: reads from `DEEPGRAM_API_KEY` environment variable)
+- `options`: Deepgram options for configuring the transcription.
+See the [Deepgram Listen V1 Connect API documentation](https://github.com/deepgram/deepgram-python-sdk/blob/main/websockets-reference.md#%EF%B8%8F-parameters) for more details.
 - `sample_rate`: Sample rate of the audio in Hz (default: 16000)
 - `language`: Language code for transcription (default: "en-US")
-- `keep_alive_interval`: Interval in seconds to send keep-alive messages (default: 5.0)
+- `keep_alive_interval`: Interval in seconds to send keep-alive messages (default: 1.0s)
+- `connection_timeout`: Timeout to wait for the Deepgram connection to be established before skipping the  in seconds to send keep-alive messages (default: 15.0s)
 ## Requirements
 - Python 3.10+
-- deepgram-sdk>=4.5.0
+- deepgram-sdk>=5.0.0,<5.1
 - numpy>=2.2.6,<2.3

pyproject.toml CHANGED Viewed

@@ -12,7 +12,7 @@ requires-python = ">=3.10"
 license = "MIT"
 dependencies = [
     "vision-agents",
-    "deepgram-sdk==4.8.1",
+    "deepgram-sdk>=5.0.0,<5.1",
     "numpy>=2.2.6,<2.3",
 ]

vision_agents/plugins/deepgram/stt.py CHANGED Viewed

@@ -1,17 +1,31 @@
-import json
+import asyncio
+import contextlib
 import logging
-from typing import Dict, Any, Optional, Tuple, List, Union, TYPE_CHECKING
-if TYPE_CHECKING:
-    from vision_agents.core.edge.types import Participant
-import numpy as np
 import os
 import time
+from typing import TYPE_CHECKING, Any, Dict, List, Optional, Tuple, Union
-from deepgram import DeepgramClient, LiveTranscriptionEvents, LiveOptions, DeepgramClientOptions
-from vision_agents.core import stt
+import numpy as np
+import websockets
+from deepgram import AsyncDeepgramClient
+from deepgram.core.events import EventType
+from deepgram.extensions.types.sockets import (
+    ListenV1ControlMessage,
+    ListenV1MetadataEvent,
+    ListenV1ResultsEvent,
+    ListenV1SpeechStartedEvent,
+    ListenV1UtteranceEndEvent,
+)
+from deepgram.listen.v1.socket_client import AsyncV1SocketClient
 from getstream.video.rtc.track_util import PcmData
+from vision_agents.core import stt
+from .utils import generate_silence
+if TYPE_CHECKING:
+    from vision_agents.core.edge.types import Participant
 logger = logging.getLogger(__name__)
@@ -35,11 +49,13 @@ class STT(stt.STT):
     def __init__(
         self,
         api_key: Optional[str] = None,
-        options: Optional[LiveOptions] = None,  # type: ignore
+        options: Optional[dict] = None,
         sample_rate: int = 48000,
         language: str = "en-US",
         interim_results: bool = True,
-        client: Optional[DeepgramClient] = None,
+        client: Optional[AsyncDeepgramClient] = None,
+        keep_alive_interval: float = 1.0,
+        connection_timeout: float = 15.0,
     ):
         """
         Initialize the Deepgram STT service.
@@ -51,6 +67,8 @@ class STT(stt.STT):
             sample_rate: Sample rate of the audio in Hz (default: 48000)
             language: Language code for transcription
             interim_results: Whether to emit interim results (partial transcripts with the partial_transcript event).
+            connection_timeout: Time to wait for the Deepgram connection to be established.
         """
         super().__init__(sample_rate=sample_rate)
@@ -64,147 +82,189 @@ class STT(stt.STT):
         # Initialize DeepgramClient with the API key
         logger.info("Initializing Deepgram client")
-        config = DeepgramClientOptions(
-            options={"keepalive": "true"}  # Comment this out to see the effect of not using keepalive
-        )
-        self.deepgram = client if client is not None else DeepgramClient(api_key, config)
-        self.dg_connection: Optional[Any] = None
-        self.options = options or LiveOptions(
-            model="nova-2",
-            language=language,
-            encoding="linear16",
-            sample_rate=sample_rate,
-            channels=1,
-            interim_results=interim_results,
+        self.deepgram = (
+            client if client is not None else AsyncDeepgramClient(api_key=api_key)
         )
+        self.dg_connection: Optional[AsyncV1SocketClient] = None
+        self.options = options or {
+            "model": "nova-2",
+            "language": language,
+            "encoding": "linear16",
+            "sample_rate": sample_rate,
+            "channels": 1,
+            "interim_results": interim_results,
+        }
         # Track current user context for associating transcripts with users
         self._current_user: Optional[Dict[str, Any]] = None
-        self._setup_connection()
+        # Generate a silence audio to use as keep-alive message
+        self._keep_alive_data = generate_silence(
+            sample_rate=sample_rate, duration_ms=10
+        )
+        self._keep_alive_interval = keep_alive_interval
+        self._stack = contextlib.AsyncExitStack()
+        # An event to detect that the connection was established once.
+        self._connected_once = asyncio.Event()
+        # Time to wait for connection to be established before sending the event
+        self._connection_timeout = connection_timeout
+        self._last_sent_at = float("-inf")
+        # Lock to prevent concurrent connection opening
+        self._connect_lock = asyncio.Lock()
-    def _handle_transcript_result(
-        self, is_final: bool, text: str, metadata: Dict[str, Any]
-    ):
+        # Start the listener loop in the background
+        asyncio.create_task(self.start())
+    async def start(self):
         """
-        Handle a transcript result by emitting it immediately.
+        Start the main task establishing the Deepgram connection and processing the events.
         """
-        # Emit immediately for real-time responsiveness
-        if is_final:
-            self._emit_transcript_event(text, self._current_user, metadata)
-        else:
-            self._emit_partial_transcript_event(text, self._current_user, metadata)
+        if self._is_closed:
+            logger.warning("Cannot setup connection - Deepgram instance is closed")
+            return None
-        logger.debug(
-            "Handled transcript result",
-            extra={
-                "is_final": is_final,
-                "text_length": len(text),
-            },
+        # Establish a Deepgram connection.
+        # Use a lock to make sure it's established only once
+        async with self._connect_lock:
+            if self.dg_connection is not None:
+                logger.debug("Connection already set up, skipping initialization")
+                return None
+            try:
+                logger.info("Creating a Deepgram connection with options %s", self.options)
+                dg_connection = await self._stack.enter_async_context(
+                    self.deepgram.listen.v1.connect(**self.options)
+                )
+            except Exception as e:
+                # Log the error and set connection to None
+                logger.exception("Error setting up Deepgram connection")
+                self.dg_connection = None
+                # Emit error immediately
+                self._emit_error_event(e, "Deepgram connection setup")
+                raise
+            finally:
+                self._connected_once.set()
+        self.dg_connection = dg_connection
+        # Start the keep-alive loop to keep the connection open
+        asyncio.create_task(self._keepalive_loop())
+        # Register event handlers
+        self.dg_connection.on(
+            EventType.OPEN,
+            lambda msg: logger.debug(f"Deepgram connection opened. message={msg}"),
         )
+        self.dg_connection.on(EventType.CLOSE, self._on_connection_close)
+        self.dg_connection.on(EventType.ERROR, self._on_connection_error)
+        self.dg_connection.on(EventType.MESSAGE, self._on_message)
+        # Start processing the events from Deepgram.
+        # This is a blocking call.
+        logger.debug("Listening to the events from a Deepgram connection")
+        await self.dg_connection.start_listening()
+        return None
-    def _setup_connection(self):
-        """Set up the Deepgram connection with event handlers."""
+    async def started(self):
+        """
+        Wait until the Deepgram connection is established.
+        """
+        if self._connected_once.is_set():
+            return
+        await asyncio.wait_for(
+            self._connected_once.wait(), timeout=self._connection_timeout
+        )
+    async def close(self):
+        """Close the Deepgram connection and clean up resources."""
         if self._is_closed:
-            logger.warning("Cannot setup connection - Deepgram instance is closed")
+            logger.debug("Deepgram STT service already closed")
             return
-        if self.dg_connection is not None:
-            logger.debug("Connection already set up, skipping initialization")
+        logger.info("Closing Deepgram STT service")
+        self._is_closed = True
+        # Close the Deepgram connection if it exists
+        if self.dg_connection:
+            logger.debug("Closing Deepgram connection")
+            try:
+                await self.dg_connection.send_control(
+                    ListenV1ControlMessage(type="CloseStream")
+                )
+                await self._stack.aclose()
+                self.dg_connection = None
+            except Exception:
+                logger.exception("Error closing Deepgram connection")
+    async def _on_message(
+        self,
+        message: ListenV1ResultsEvent
+        | ListenV1MetadataEvent
+        | ListenV1UtteranceEndEvent
+        | ListenV1SpeechStartedEvent,
+    ):
+        if message.type != "Results":
+            logger.debug(
+                "Received non-transcript message, skip processing. message=%s", message
+            )
             return
-        try:
-            # Use the newer websocket interface instead of deprecated live
-            logger.debug("Setting up Deepgram WebSocket connection")
-            self.dg_connection = self.deepgram.listen.websocket.v("1")
-            assert self.dg_connection is not None
-            # Handler for transcript results
-            def handle_transcript(conn, result=None):
-                try:
-                    # Update the last activity time
-                    self.last_activity_time = time.time()
-                    # Check if result is already a dict (from LiveResultResponse or test mocks)
-                    if isinstance(result, dict):
-                        transcript = result
-                    elif hasattr(result, "to_dict"):
-                        transcript = result.to_dict()
-                    elif hasattr(result, "to_json"):
-                        transcript = json.loads(result.to_json())
-                    elif isinstance(result, (str, bytes, bytearray)):
-                        transcript = json.loads(result)
-                    else:
-                        logger.warning(
-                            "Unrecognized transcript format: %s", type(result)
-                        )
-                        return
-                    # Get the transcript text from the response
-                    alternatives = transcript.get("channel", {}).get("alternatives", [])
-                    if not alternatives:
-                        return
-                    transcript_text = alternatives[0].get("transcript", "")
-                    if not transcript_text:
-                        return
-                    # Check if this is a final result
-                    is_final = transcript.get("is_final", False)
-                    # Create metadata with useful information
-                    metadata = {
-                        "confidence": alternatives[0].get("confidence", 0),
-                        "words": alternatives[0].get("words", []),
-                        "is_final": is_final,
-                        "channel_index": transcript.get("channel_index", 0),
-                    }
-                    # Handle the result (both collect and emit)
-                    self._handle_transcript_result(is_final, transcript_text, metadata)
-                    logger.debug(
-                        "Received transcript",
-                        extra={
-                            "is_final": is_final,
-                            "text_length": len(transcript_text),
-                            "confidence": metadata["confidence"],
-                        },
-                    )
-                except Exception as e:
-                    logger.error("Error processing transcript", exc_info=e)
-                    # Emit error immediately
-                    self._emit_error_event(e, "Deepgram transcript processing")
-            # Handler for errors
-            def handle_error(conn, error=None):
-                # Update the last activity time
-                self.last_activity_time = time.time()
-                error_text = str(error) if error is not None else "Unknown error"
-                logger.error("Deepgram error received: %s", error_text)
+        transcript = message.dict()
-                # Emit error immediately
-                error_obj = Exception(f"Deepgram error: {error_text}")
-                self._emit_error_event(error_obj, "Deepgram connection")
+        # Get the transcript text from the response
+        alternatives = transcript.get("channel", {}).get("alternatives", [])
+        if not alternatives:
+            return
-            # Register event handlers directly
-            self.dg_connection.on(LiveTranscriptionEvents.Transcript, handle_transcript)
-            self.dg_connection.on(LiveTranscriptionEvents.Error, handle_error)
+        transcript_text = alternatives[0].get("transcript", "")
+        if not transcript_text:
+            return
-            # Start the connection
-            logger.info("Starting Deepgram connection with options %s", self.options)
-            self.dg_connection.start(self.options)
+        # Check if this is a final result
+        is_final = transcript.get("is_final", False)
-        except Exception as e:
-            # Log the error and set connection to None
-            logger.error("Error setting up Deepgram connection", exc_info=e)
-            self.dg_connection = None
-            # Emit error immediately
-            self._emit_error_event(e, "Deepgram connection setup")
+        # Create metadata with useful information
+        metadata = {
+            "confidence": alternatives[0].get("confidence", 0),
+            "words": alternatives[0].get("words", []),
+            "is_final": is_final,
+            "channel_index": transcript.get("channel_index", 0),
+        }
+        # Emit immediately for real-time responsiveness
+        if is_final:
+            self._emit_transcript_event(transcript_text, self._current_user, metadata)
+        else:
+            self._emit_partial_transcript_event(
+                transcript_text, self._current_user, metadata
+            )
+        logger.debug(
+            "Received transcript",
+            extra={
+                "is_final": is_final,
+                "text_length": len(transcript_text),
+                "confidence": metadata["confidence"],
+            },
+        )
+    async def _on_connection_error(self, error: websockets.WebSocketException):
+        error_text = str(error) if error is not None else "Unknown error"
+        logger.error("Deepgram error received: %s", error_text)
+        # Emit error immediately
+        error_obj = Exception(f"Deepgram error: {error_text}")
+        self._emit_error_event(error_obj, "Deepgram connection")
+    async def _on_connection_close(self, message: Any):
+        logger.warning(f"Deepgram connection closed. message={message}")
+        await self.close()
     async def _process_audio_impl(
-        self, pcm_data: PcmData, user_metadata: Optional[Union[Dict[str, Any], "Participant"]] = None
+        self,
+        pcm_data: PcmData,
+        user_metadata: Optional[Union[Dict[str, Any], "Participant"]] = None,
     ) -> Optional[List[Tuple[bool, str, Dict[str, Any]]]]:
         """
         Process audio data through Deepgram for transcription.
@@ -233,44 +293,57 @@ class STT(stt.STT):
                 self.sample_rate,
             )
-        # Update the last activity time
-        self.last_activity_time = time.time()
         # Convert PCM data to bytes if needed
         audio_data = pcm_data.samples
         if not isinstance(audio_data, bytes):
             # Convert numpy array to bytes
             audio_data = audio_data.astype(np.int16).tobytes()
-        # Send the audio data to Deepgram
+        # Wait for the attempt to establish the connection
         try:
-            logger.debug(
-                "Sending audio data to Deepgram",
-                extra={"audio_bytes": len(audio_data)},
+            await self.started()
+        except asyncio.TimeoutError:
+            logger.error(
+                f"Deepgram connection is not established within {self._connection_timeout} seconds. "
+                f"Skipping the audio package."
             )
-            assert self.dg_connection is not None
-            self.dg_connection.send(audio_data)
-        except Exception as e:
-            # Raise exception to be handled by base class
-            raise Exception(f"Deepgram audio transmission error: {e}")
+            return None
-        # Return None for asynchronous mode - events are emitted when they arrive
+        # Send the audio data to Deepgram
+        logger.debug(
+            "Sending audio data to Deepgram",
+            extra={"audio_bytes": len(audio_data)},
+        )
+        await self._send_audio(audio_data)
         return None
-    async def close(self):
-        """Close the Deepgram connection and clean up resources."""
-        if self._is_closed:
-            logger.debug("Deepgram STT service already closed")
+    async def _send_audio(self, data: bytes):
+        if self.dg_connection is None:
+            logger.warning("Deepgram connection is not established")
             return
-        logger.info("Closing Deepgram STT service")
-        self._is_closed = True
+        try:
+            await self.dg_connection.send_media(data)
+            self._last_sent_at = time.time()
+        except Exception as e:
+            # Raise exception to be handled by base class
+            raise Exception(f"Deepgram audio transmission error: {e}") from e
-        # Close the Deepgram connection if it exists
-        if self.dg_connection:
-            logger.debug("Closing Deepgram connection")
-            try:
-                self.dg_connection.finish()
-                self.dg_connection = None
-            except Exception as e:
-                logger.error("Error closing Deepgram connection", exc_info=e)
+    async def _keepalive_loop(self):
+        """
+        Send the silence audio every `interval` seconds
+        to prevent Deepgram from closing the connection.
+        """
+        while not self._is_closed and self.dg_connection is not None:
+            if self._last_sent_at + self._keep_alive_interval <= time.time():
+                logger.debug("Sending keepalive packet to Deepgram...")
+                # Send audio silence to keep the connection open
+                await self._send_audio(self._keep_alive_data)
+                # Send keep-alive message as well
+                await self.dg_connection.send_control(
+                    ListenV1ControlMessage(type="KeepAlive")
+                )
+            # Sleep max for 1s to avoid missing the keep-alive schedule
+            timeout = min(self._keep_alive_interval, 1.0)
+            await asyncio.sleep(timeout)

vision_agents/plugins/deepgram/utils.py ADDED Viewed

@@ -0,0 +1,18 @@
+import numpy as np
+def generate_silence(sample_rate: int, duration_ms: int) -> bytes:
+    """
+    Generate a silence of the given sample_rate and duration_ms.
+    """
+    # Audio parameters
+    channels = 1
+    sample_format = np.int16  # 16-bit signed PCM
+    # Number of samples = sample_rate * duration_seconds
+    num_samples = int(sample_rate * (duration_ms / 1000.0))
+    # Create silence raw bytes (s16 mono PCM)
+    pcm_bytes = np.zeros((num_samples, channels), dtype=sample_format).tobytes()
+    return pcm_bytes

{vision_agents_plugins_deepgram-0.1.5.dist-info → vision_agents_plugins_deepgram-0.1.6.dist-info}/METADATA RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: vision-agents-plugins-deepgram
-Version: 0.1.5
+Version: 0.1.6
 Summary: Deepgram STT integration for Vision Agents
 Project-URL: Documentation, https://visionagents.ai/
 Project-URL: Website, https://visionagents.ai/
@@ -8,31 +8,32 @@ Project-URL: Source, https://github.com/GetStream/Vision-Agents
 License-Expression: MIT
 Keywords: AI,STT,agents,deepgram,speech-to-text,transcription,voice agents
 Requires-Python: >=3.10
-Requires-Dist: deepgram-sdk==4.8.1
+Requires-Dist: deepgram-sdk<5.1,>=5.0.0
 Requires-Dist: numpy<2.3,>=2.2.6
 Requires-Dist: vision-agents
 Description-Content-Type: text/markdown
 # Deepgram Speech-to-Text Plugin
-A high-quality Speech-to-Text (STT) plugin for GetStream that uses the Deepgram API.
+A high-quality Speech-to-Text (STT) plugin for Vision agents that uses the Deepgram API.
 ## Installation
 ```bash
-pip install getstream-plugins-deepgram
+uv add vision-agents-plugins-deepgram
 ```
 ## Usage
 ```python
-from getstream.plugins.deepgram import DeepgramSTT
+from vision_agents.plugins import deepgram
+from getstream.video.rtc.track_util import PcmData
 # Initialize with API key from environment variable
-stt = DeepgramSTT()
+stt = deepgram.STT()
 # Or specify API key directly
-stt = DeepgramSTT(api_key="your_deepgram_api_key")
+stt = deepgram.STT(api_key="your_deepgram_api_key")
 # Register event handlers
 @stt.on("transcript")
@@ -44,6 +45,7 @@ def on_partial(text, user, metadata):
     print(f"Partial transcript from {user}: {text}")
 # Process audio
+pcm_data = PcmData(samples=b"\x00\x00" * 1000, sample_rate=48000, format="s16")
 await stt.process_audio(pcm_data)
 # When done
@@ -52,14 +54,16 @@ await stt.close()
 ## Configuration Options
-- `api_key`: Deepgram API key (default: reads from DEEPGRAM_API_KEY environment variable)
-- `options`: Deepgram LiveOptions for configuring the transcription
+- `api_key`: Deepgram API key (default: reads from `DEEPGRAM_API_KEY` environment variable)
+- `options`: Deepgram options for configuring the transcription.
+See the [Deepgram Listen V1 Connect API documentation](https://github.com/deepgram/deepgram-python-sdk/blob/main/websockets-reference.md#%EF%B8%8F-parameters) for more details.
 - `sample_rate`: Sample rate of the audio in Hz (default: 16000)
 - `language`: Language code for transcription (default: "en-US")
-- `keep_alive_interval`: Interval in seconds to send keep-alive messages (default: 5.0)
+- `keep_alive_interval`: Interval in seconds to send keep-alive messages (default: 1.0s)
+- `connection_timeout`: Timeout to wait for the Deepgram connection to be established before skipping the  in seconds to send keep-alive messages (default: 15.0s)
 ## Requirements
 - Python 3.10+
-- deepgram-sdk>=4.5.0
+- deepgram-sdk>=5.0.0,<5.1
 - numpy>=2.2.6,<2.3

vision_agents_plugins_deepgram-0.1.6.dist-info/RECORD ADDED Viewed

@@ -0,0 +1,13 @@
+./.gitignore,sha256=S6wPCu4rBDB_yyTYoXbMIR-pn4OPv6b3Ulnx1n5RWvo,916
+./PKG-INFO,sha256=Dk3w-R0OZAg3vtpBAH04f7XZLFDGVbGHopiZUjngTiQ,2273
+./README.md,sha256=CX3wmR5ztY0crI5VSmBt2K0vBVjFvEhBr-SNuycL1Uc,1717
+./pyproject.toml,sha256=W6nptgCD5B-Nmob7_af6knTNrXDRWAT-BPaGKzHVXHY,1102
+./vision_agents/plugins/deepgram/__init__.py,sha256=iBBsZvcyd4KfkcUHsi1QiVVQnPEKvAweGZ40eHeENs4,159
+./vision_agents/plugins/deepgram/stt.py,sha256=I2eNU_O_xAX5rDJufm-ooVvF4kYxOrPh0_F2i8diYWY,13124
+./vision_agents/plugins/deepgram/utils.py,sha256=7xcGxnhcuVpqHIp1F_d1ARTq6y0jQGZsPx_2hwBifZ0,527
+vision_agents/plugins/deepgram/__init__.py,sha256=iBBsZvcyd4KfkcUHsi1QiVVQnPEKvAweGZ40eHeENs4,159
+vision_agents/plugins/deepgram/stt.py,sha256=I2eNU_O_xAX5rDJufm-ooVvF4kYxOrPh0_F2i8diYWY,13124
+vision_agents/plugins/deepgram/utils.py,sha256=7xcGxnhcuVpqHIp1F_d1ARTq6y0jQGZsPx_2hwBifZ0,527
+vision_agents_plugins_deepgram-0.1.6.dist-info/METADATA,sha256=Dk3w-R0OZAg3vtpBAH04f7XZLFDGVbGHopiZUjngTiQ,2273
+vision_agents_plugins_deepgram-0.1.6.dist-info/WHEEL,sha256=qtCwoSJWgHk21S1Kb4ihdzI2rlJ1ZKaIurTj_ngOhyQ,87
+vision_agents_plugins_deepgram-0.1.6.dist-info/RECORD,,

vision_agents_plugins_deepgram-0.1.5.dist-info/RECORD DELETED Viewed

@@ -1,11 +0,0 @@
-./.gitignore,sha256=S6wPCu4rBDB_yyTYoXbMIR-pn4OPv6b3Ulnx1n5RWvo,916
-./PKG-INFO,sha256=KWHYHyxCwhi8_0YAo0-QYJcLARlbvrck6g0WsmeFtlQ,1793
-./README.md,sha256=RQMD14Xdhof5KIHFkJe0GK4lomoyijCDiBpdt9RG5Bk,1242
-./pyproject.toml,sha256=lWrmuNRybdSuN1cKoRDwW20J4gQ-FPnrSs0AUt3z5Dk,1097
-./vision_agents/plugins/deepgram/__init__.py,sha256=iBBsZvcyd4KfkcUHsi1QiVVQnPEKvAweGZ40eHeENs4,159
-./vision_agents/plugins/deepgram/stt.py,sha256=jMMIAG8NkBB5CkH-MmJX1KwlUTbmapOcdDBiS4jddCI,11151
-vision_agents/plugins/deepgram/__init__.py,sha256=iBBsZvcyd4KfkcUHsi1QiVVQnPEKvAweGZ40eHeENs4,159
-vision_agents/plugins/deepgram/stt.py,sha256=jMMIAG8NkBB5CkH-MmJX1KwlUTbmapOcdDBiS4jddCI,11151
-vision_agents_plugins_deepgram-0.1.5.dist-info/METADATA,sha256=KWHYHyxCwhi8_0YAo0-QYJcLARlbvrck6g0WsmeFtlQ,1793
-vision_agents_plugins_deepgram-0.1.5.dist-info/WHEEL,sha256=qtCwoSJWgHk21S1Kb4ihdzI2rlJ1ZKaIurTj_ngOhyQ,87
-vision_agents_plugins_deepgram-0.1.5.dist-info/RECORD,,

{vision_agents_plugins_deepgram-0.1.5.dist-info → vision_agents_plugins_deepgram-0.1.6.dist-info}/WHEEL RENAMED Viewed

File without changes

vision-agents-plugins-deepgram 0.1.5__py3-none-any.whl → 0.1.6__py3-none-any.whl

Potentially problematic release.

vision-agents-plugins-deepgram 0.1.5py3-none-any.whl → 0.1.6py3-none-any.whl