PyPI - intellema-vdk - Versions diffs - 0.1.0__tar.gz → 0.2.0__tar.gz - Mend

intellema-vdk 0.1.0tar.gz → 0.2.0tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (28) hide show

{intellema_vdk-0.1.0 → intellema_vdk-0.2.0}/MANIFEST.in RENAMED Viewed

@@ -2,3 +2,4 @@ include README.md
 include requirements.txt
 include LICENSE
 recursive-include intellema_vdk *
+recursive-exclude intellema_vdk/agent_api *

{intellema_vdk-0.1.0/intellema_vdk.egg-info → intellema_vdk-0.2.0}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: intellema-vdk
-Version: 0.1.0
+Version: 0.2.0
 Summary: A Voice Development Kit for different Voice Agent Platforms
 Author: Intellema
 License: MIT License
@@ -37,6 +37,12 @@ Requires-Dist: boto3>=1.28.0
 Requires-Dist: twilio
 Requires-Dist: retell-sdk
 Requires-Dist: requests
+Requires-Dist: openai
+Requires-Dist: httpx
+Requires-Dist: pyaudio
+Requires-Dist: together
+Requires-Dist: langchain-openai
+Requires-Dist: langchain-core
 Dynamic: license-file
 # Intellema VDK
@@ -100,6 +106,73 @@ from intellema_vdk import start_outbound_call
 await start_outbound_call("livekit", phone_number="+1...")
 ```
+## Speech To Text (STT)
+The `STTManager` class provides an interface for transcribing audio files using OpenAI's Whisper model and optionally posting the transcribed text to a specified agent API.
+### Usage
+Here's how to use the `STTManager` to transcribe an audio file and post the result:
+Ensure to set OPENAI_API_KEY and AGENT_API_URL in your `.env` file.
+```python
+import asyncio
+from intellema_vdk import STTManager
+async def main():
+    # 1- Initialize the STTManager
+    stt_manager = STTManager()
+    try:
+        # 2- Transcribe an audio file and post the result to your agent API URL (if provided)
+        # Replace "path/to/your/audio.mp3" with the actual file path
+        transcript = await stt_manager.transcribe_and_post("path/to/your/audio.mp3")
+        print(f"Transcription: {transcript}")
+    except FileNotFoundError:
+        print("The audio file was not found.")
+    except Exception as e:
+        print(f"An error occurred: {e}")
+    finally:
+        # 3- Clean up
+        await stt_manager.close()
+if __name__ == "__main__":
+    asyncio.run(main())
+```
+## TTS Streaming
+The `TTSStreamer` class provides low-latency text-to-speech streaming using Together AI's inference engine. It enables real-time voice synthesis from streaming LLM responses.
+### Running the Sample implementation
+We provide a ready-to-use sample that connects LangChain (OpenAI) with the TTS Streamer.
+1.  **Configure Keys**: Ensure `OPENAI_API_KEY` and `TOGETHER_API_KEY` are set in your `.env`.
+2.  **Run the script**:
+    ```bash
+    python sample_implementation.py
+    ```
+### Library Usage
+You can integrate the streamer into your own loops:
+```python
+from intellema_vdk import TTSStreamer
+# 1. Initialize per turn
+tts = TTSStreamer()
+# 2. Feed text chunks as they are generated
+for chunk in llm_response_stream:
+    tts.feed(chunk)
+# 3. Flush and clean up
+tts.flush()
+tts.close()
+```
 ## Configuration
@@ -115,6 +188,34 @@ TWILIO_AUTH_TOKEN=your-token
 TWILIO_PHONE_NUMBER=your-number
 RETELL_API_KEY=your-retell-key
 RETELL_AGENT_ID=your-agent-id
+TOGETHER_API_KEY=your-together-key
+OPENAI_API_KEY=your-openai-key
+AGENT_API_URL=https://your-agent-api.com/endpoint
 ```
+## Retell Setup
+**Important:** Before initiating calls with Retell, you must register your Twilio phone number with Retell. This binds your agent to the number and allows Retell to handle the call flow.
+You can register your number in two ways:
+1.  **Using the Helper Script:**
+    We provide an interactive script to guide you through the process:
+    ```bash
+    python import_phone_number.py
+    ```
+2.  **Programmatically:**
+    ```python
+    from intellema_vdk.retell_lib.retell_client import RetellManager
+    manager = RetellManager()
+    # Optional: Pass termination_uri if you have a SIP trunk
+    manager.import_phone_number(nickname="My Twilio Number")
+    ```
+## Notes
+- **Retell `delete_room` Limitation**: The `delete_room` method for Retell relies on updating dynamic variables during the conversation loop. As a result, it **only works if the user speaks something** which triggers the agent to check the variable and terminate the call.

intellema_vdk-0.2.0/README.md ADDED Viewed

@@ -0,0 +1,174 @@
+# Intellema VDK
+Intellema VDK is a unified Voice Development Kit designed to simplify the integration and management of various voice agent platforms. It provides a consistent, factory-based API to interact with providers like LiveKit and Retell AI, enabling developers to build scalable voice applications with ease. Whether you need real-time streaming, outbound calling, or participant management, Intellema VDK abstracts the complexity into a single, intuitive interface.
+## Features
+- **Room Management**: Create and delete rooms dynamically.
+- **Participant Management**: Generate tokens, kick users, and mute tracks.
+- **SIP Outbound Calling**: Initiate calls to phone numbers via SIP trunks.
+- **Streaming & Recording**: Stream to RTMP destinations and record room sessions directly to AWS S3.
+- **Real-time Alerts**: Send data packets (alerts) to participants.
+## Prerequisites
+- Python 3.8+
+- A SIP Provider (for outbound calls)
+## Installation
+```bash
+pip install intellema-vdk
+```
+## Usage
+### Unified Wrapper (Factory Pattern)
+The recommended way to use the library is via the `VoiceClient` factory:
+```python
+import asyncio
+from intellema_vdk import VoiceClient
+async def main():
+    # 1. Initialize the client
+    client = VoiceClient("livekit")
+    # 2. Use methods directly
+    call_id = await client.start_outbound_call(
+        phone_number="+15551234567",
+        prompt_content="Hello from LiveKit"
+    )
+    # 3. Clean API calls
+    await client.mute_participant(call_id, "user-1", "track-1", True)
+    await client.close()
+if __name__ == "__main__":
+    asyncio.run(main())
+```
+### Convenience Function
+For quick one-off calls, you can still use the helper:
+```python
+from intellema_vdk import start_outbound_call
+await start_outbound_call("livekit", phone_number="+1...")
+```
+## Speech To Text (STT)
+The `STTManager` class provides an interface for transcribing audio files using OpenAI's Whisper model and optionally posting the transcribed text to a specified agent API.
+### Usage
+Here's how to use the `STTManager` to transcribe an audio file and post the result:
+Ensure to set OPENAI_API_KEY and AGENT_API_URL in your `.env` file.
+```python
+import asyncio
+from intellema_vdk import STTManager
+async def main():
+    # 1- Initialize the STTManager
+    stt_manager = STTManager()
+    try:
+        # 2- Transcribe an audio file and post the result to your agent API URL (if provided)
+        # Replace "path/to/your/audio.mp3" with the actual file path
+        transcript = await stt_manager.transcribe_and_post("path/to/your/audio.mp3")
+        print(f"Transcription: {transcript}")
+    except FileNotFoundError:
+        print("The audio file was not found.")
+    except Exception as e:
+        print(f"An error occurred: {e}")
+    finally:
+        # 3- Clean up
+        await stt_manager.close()
+if __name__ == "__main__":
+    asyncio.run(main())
+```
+## TTS Streaming
+The `TTSStreamer` class provides low-latency text-to-speech streaming using Together AI's inference engine. It enables real-time voice synthesis from streaming LLM responses.
+### Running the Sample implementation
+We provide a ready-to-use sample that connects LangChain (OpenAI) with the TTS Streamer.
+1.  **Configure Keys**: Ensure `OPENAI_API_KEY` and `TOGETHER_API_KEY` are set in your `.env`.
+2.  **Run the script**:
+    ```bash
+    python sample_implementation.py
+    ```
+### Library Usage
+You can integrate the streamer into your own loops:
+```python
+from intellema_vdk import TTSStreamer
+# 1. Initialize per turn
+tts = TTSStreamer()
+# 2. Feed text chunks as they are generated
+for chunk in llm_response_stream:
+    tts.feed(chunk)
+# 3. Flush and clean up
+tts.flush()
+tts.close()
+```
+## Configuration
+Create a `.env` file in the root directory:
+```bash
+LIVEKIT_URL=wss://your-livekit-domain.com
+LIVEKIT_API_KEY=your-key
+LIVEKIT_API_SECRET=your-secret
+SIP_OUTBOUND_TRUNK_ID=your-trunk-id
+TWILIO_ACCOUNT_SID=your-sid
+TWILIO_AUTH_TOKEN=your-token
+TWILIO_PHONE_NUMBER=your-number
+RETELL_API_KEY=your-retell-key
+RETELL_AGENT_ID=your-agent-id
+TOGETHER_API_KEY=your-together-key
+OPENAI_API_KEY=your-openai-key
+AGENT_API_URL=https://your-agent-api.com/endpoint
+```
+## Retell Setup
+**Important:** Before initiating calls with Retell, you must register your Twilio phone number with Retell. This binds your agent to the number and allows Retell to handle the call flow.
+You can register your number in two ways:
+1.  **Using the Helper Script:**
+    We provide an interactive script to guide you through the process:
+    ```bash
+    python import_phone_number.py
+    ```
+2.  **Programmatically:**
+    ```python
+    from intellema_vdk.retell_lib.retell_client import RetellManager
+    manager = RetellManager()
+    # Optional: Pass termination_uri if you have a SIP trunk
+    manager.import_phone_number(nickname="My Twilio Number")
+    ```
+## Notes
+- **Retell `delete_room` Limitation**: The `delete_room` method for Retell relies on updating dynamic variables during the conversation loop. As a result, it **only works if the user speaks something** which triggers the agent to check the variable and terminate the call.

{intellema_vdk-0.1.0 → intellema_vdk-0.2.0}/intellema_vdk/__init__.py RENAMED Viewed

@@ -1,12 +1,9 @@
 from typing import Optional, List, Any
-import os
-from dotenv import load_dotenv
-# Load environment variables
-load_dotenv()
 from .livekit_lib.client import LiveKitManager
 from .retell_lib.retell_client import RetellManager
+from .speech_lib.stt_client import STTManager
+from .speech_lib.tts_streamer import TTSStreamer
 def VoiceClient(provider: str, **kwargs) -> Any:
     """

intellema_vdk-0.2.0/intellema_vdk/retell_lib/import_phone_number.py ADDED Viewed

@@ -0,0 +1,73 @@
+import os
+import sys
+# Add the project root to the python path so we can import intellema_vdk
+sys.path.append(os.path.abspath(os.path.join(os.path.dirname(__file__), '..')))
+from intellema_vdk.retell_lib.retell_client import RetellManager
+def import_twilio_number():
+    """
+    Import your Twilio phone number to Retell.
+    This is required before you can make outbound calls using Retell.
+    """
+    try:
+        manager = RetellManager()
+        print("=== Retell Phone Number Import ===\n")
+        print(f"Phone Number to import: {manager.twilio_number}")
+        print(f"Agent ID to bind: {manager.retell_agent_id}\n")
+        # Ask if user has a Twilio SIP trunk
+        print("Do you have a Twilio Elastic SIP Trunk configured?")
+        print("If you're not sure, you can:")
+        print("  1. Visit: https://console.twilio.com/us1/develop/voice/manage/trunks")
+        print("  2. Or just press Enter to try without it (may not work for some setups)\n")
+        has_trunk = input("Do you have a SIP trunk? (y/n, default: n): ").strip().lower()
+        termination_uri = None
+        sip_username = None
+        sip_password = None
+        if has_trunk == 'y':
+            print("\nEnter your Twilio SIP Trunk Termination URI.")
+            print("Format: yourtrunkname.pstn.twilio.com")
+            print("You can find this in Twilio Console > Elastic SIP Trunking > Your Trunk > Termination")
+            termination_uri = input("Termination URI: ").strip()
+            print("\nDo you use Credential List authentication? (Recommended)")
+            has_creds = input("Use credentials? (y/n, default: y): ").strip().lower() or 'y'
+            if has_creds == 'y':
+                print("Enter the username/password from your Twilio Credential List:")
+                sip_username = input("Username: ").strip()
+                sip_password = input("Password: ").strip()
+        # Optional nickname
+        nickname = input("\nOptional: Enter a nickname for this number (press Enter to skip): ").strip() or None
+        print(f"\n=== Importing Phone Number ===")
+        response = manager.import_phone_number(
+            termination_uri=termination_uri,
+            nickname=nickname,
+            sip_trunk_auth_username=sip_username,
+            sip_trunk_auth_password=sip_password
+        )
+        print(f"\n=== Import Successful! ===")
+        print(f"You can now use this number to make outbound calls via Retell.")
+        return response
+    except Exception as e:
+        print(f"\n✗ Import failed: {e}")
+        print(f"\nTroubleshooting:")
+        print(f"  1. If you don't have a SIP trunk, you may need to purchase the number through Retell")
+        print(f"  2. Visit Retell dashboard: https://app.retellai.com/")
+        print(f"  3. Or create a Twilio Elastic SIP Trunk first")
+        raise
+if __name__ == "__main__":
+    import_twilio_number()

intellema_vdk-0.2.0/intellema_vdk/retell_lib/retell_client.py ADDED Viewed

@@ -0,0 +1,248 @@
+import os
+from typing import List, Optional
+from dotenv import load_dotenv
+from twilio.rest import Client
+from retell import Retell
+import time
+import uuid
+import requests
+import boto3
+# Load environment variables
+load_dotenv(dotenv_path=".env.local")
+load_dotenv()
+class RetellManager:
+    def __init__(self):
+        self.twilio_account_sid = os.getenv("TWILIO_ACCOUNT_SID")
+        self.twilio_auth_token = os.getenv("TWILIO_AUTH_TOKEN")
+        self.twilio_number = os.getenv("TWILIO_PHONE_NUMBER")
+        self.retell_api_key = os.getenv("RETELL_API_KEY")
+        self.retell_agent_id = os.getenv("RETELL_AGENT_ID")
+        if not all([self.twilio_account_sid, self.twilio_auth_token, self.twilio_number, self.retell_api_key, self.retell_agent_id]):
+            raise ValueError("Missing necessary environment variables for RetellManager")
+        self.twilio_client = Client(self.twilio_account_sid, self.twilio_auth_token)
+        self.retell_client = Retell(api_key=self.retell_api_key)
+    def import_phone_number(self, termination_uri: str = None, outbound_agent_id: str = None, inbound_agent_id: str = None, nickname: str = None, sip_trunk_auth_username: str = None, sip_trunk_auth_password: str = None):
+        """
+        Import/register your Twilio phone number with Retell.
+        This is required before you can make outbound calls using the phone number.
+        Args:
+            termination_uri: Twilio SIP trunk termination URI (e.g., "yourtrunk.pstn.twilio.com").
+                           If not provided, will try to use a default format.
+            outbound_agent_id: Agent ID to use for outbound calls. Defaults to self.retell_agent_id.
+            inbound_agent_id: Agent ID to use for inbound calls. Defaults to None (no inbound).
+            nickname: Optional nickname for the phone number.
+            sip_trunk_auth_username: Username for SIP trunk authentication (if using credential list).
+            sip_trunk_auth_password: Password for SIP trunk authentication (if using credential list).
+        Returns:
+            The phone number registration response from Retell.
+        """
+        # Build the import kwargs
+        import_kwargs = {
+            "phone_number": self.twilio_number,
+        }
+        # Add termination URI if provided
+        if termination_uri:
+            import_kwargs["termination_uri"] = termination_uri
+        # Add SIP credentials if provided
+        if sip_trunk_auth_username and sip_trunk_auth_password:
+            import_kwargs["sip_trunk_auth_username"] = sip_trunk_auth_username
+            import_kwargs["sip_trunk_auth_password"] = sip_trunk_auth_password
+        # Set outbound agent (required for outbound calls)
+        if outbound_agent_id:
+            import_kwargs["outbound_agent_id"] = outbound_agent_id
+        elif self.retell_agent_id:
+            import_kwargs["outbound_agent_id"] = self.retell_agent_id
+        # Set inbound agent if provided
+        if inbound_agent_id:
+            import_kwargs["inbound_agent_id"] = inbound_agent_id
+        # Add nickname if provided
+        if nickname:
+            import_kwargs["nickname"] = nickname
+        try:
+            response = self.retell_client.phone_number.import_(**import_kwargs)
+            print(f"✓ Phone number {self.twilio_number} successfully imported to Retell!")
+            print(f"  Phone Number: {response.phone_number}")
+            print(f"  Type: {response.phone_number_type}")
+            if hasattr(response, 'outbound_agent_id') and response.outbound_agent_id:
+                print(f"  Outbound Agent: {response.outbound_agent_id}")
+            if hasattr(response, 'inbound_agent_id') and response.inbound_agent_id:
+                print(f"  Inbound Agent: {response.inbound_agent_id}")
+            return response
+        except Exception as e:
+            print(f"✗ Error importing phone number: {e}")
+            print(f"\nNote: If you're using Twilio, you may need to:")
+            print(f"  1. Create an Elastic SIP Trunk in Twilio console")
+            print(f"  2. Provide the termination_uri parameter (e.g., 'yourtrunk.pstn.twilio.com')")
+            print(f"  3. Or purchase the number directly through Retell dashboard")
+            raise
+    def start_outbound_call(self, phone_number: str, prompt_content: str = None, call_id: str = None) -> str:
+        call_kwargs = {
+            "from_number": self.twilio_number,
+            "to_number": phone_number,
+            "override_agent_id": self.retell_agent_id,
+        }
+        if call_id:
+            call_kwargs["metadata"] = {"call_id": call_id}
+    # Only set up dynamic variables if we have a custom prompt
+        if prompt_content:
+            enhanced_prompt = (
+                f"{prompt_content}\n\n"
+                "SYSTEM INSTRUCTION: Continually check the dynamic variable {{force_end}}. "
+                "If {{force_end}} == 'true, you must IMMEDIATELY say goodbye and activate your end_call tool to terminate the call."
+            )
+            call_kwargs["retell_llm_dynamic_variables"] = {
+                "prompt_content": enhanced_prompt,
+                "force_end": "false"
+            }
+        call_response = self.retell_client.call.create_phone_call(**call_kwargs)
+        print(f"Call created successfully!")
+        print(f"Retell Call ID: {call_response.call_id}")
+        print(f"Call Status: {call_response.call_status}")
+        return call_response.call_id
+    def delete_room(self, call_id: str):
+        try:
+            call_data = self.retell_client.call.retrieve(call_id)
+            print(f"Current call status: {call_data.call_status}")
+            if call_data.call_status in ['registered', 'ongoing', 'dialing']:
+                print(f"Triggering end for Retell call {call_id}...")
+                self.retell_client.call.update(
+                    call_id,
+                    override_dynamic_variables={"force_end": "true"}
+                )
+                print("✓ force_end override sent to Retell API")
+            else:
+                print(f"Call already ended: {call_data.call_status}")
+        except Exception as e:
+            print(f"Error ending call {call_id}: {e}")
+            raise
+    def start_stream(self, call_id: str, rtmp_urls: List[str]):
+        """
+        Starts a Twilio Media Stream.
+        Note: Twilio streams are WebSocket-based. If rtmp_urls contains a WSS URL, it will work.
+        """
+        if not rtmp_urls:
+            raise ValueError("No stream URLs provided")
+        self.twilio_client.calls(call_id).streams.create(
+            url=rtmp_urls[0]
+        )
+    def start_recording(self, call_id: str, output_filepath: Optional[str] = None, upload_to_s3: bool = True, wait_for_completion: bool = True):
+        """
+        Triggers a recording on the active Twilio call.
+        Args:
+            call_id: The Twilio Call SID.
+            output_filepath: Optional filename for the recording.
+            upload_to_s3: If True, uploads to S3.
+            wait_for_completion: If True, waits for recording to finish and then uploads.
+        Returns:
+            The Twilio Recording SID.
+        """
+        # Start Twilio recording
+        recording = self.twilio_client.calls(call_id).recordings.create()
+        print(f"Recording started: {recording.sid}")
+        if not wait_for_completion:
+            return recording.sid
+        # Poll for recording completion
+        print("Waiting for recording to complete...")
+        while True:
+            rec_status = self.twilio_client.recordings(recording.sid).fetch()
+            if rec_status.status == 'completed':
+                print("Recording completed.")
+                break
+            elif rec_status.status in ['failed', 'absent']:
+                raise RuntimeError(f"Recording failed with status: {rec_status.status}")
+            time.sleep(5)
+        if not upload_to_s3:
+            return recording.sid
+        # Download recording from Twilio
+        media_url = f"https://api.twilio.com/2010-04-01/Accounts/{self.twilio_account_sid}/Recordings/{recording.sid}.mp3"
+        print(f"Downloading recording from: {media_url}")
+        response = requests.get(media_url, auth=(self.twilio_account_sid, self.twilio_auth_token))
+        if response.status_code != 200:
+            raise RuntimeError(f"Failed to download recording: {response.status_code} {response.text}")
+        # Upload to S3
+        access_key = os.getenv("AWS_ACCESS_KEY_ID")
+        secret_key = os.getenv("AWS_SECRET_ACCESS_KEY")
+        bucket = os.getenv("AWS_S3_BUCKET")
+        region = os.getenv("AWS_REGION")
+        if not access_key or not secret_key or not bucket:
+            raise ValueError("AWS credentials (AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, AWS_S3_BUCKET) are required for S3 upload.")
+        filename = output_filepath if output_filepath else f"{call_id}-{uuid.uuid4().hex[:6]}.mp3"
+        s3 = boto3.client(
+            's3',
+            aws_access_key_id=access_key,
+            aws_secret_access_key=secret_key,
+            region_name=region
+        )
+        print(f"Uploading to S3: s3://{bucket}/{filename}")
+        s3.put_object(Bucket=bucket, Key=filename, Body=response.content)
+        print(f"Upload complete: s3://{bucket}/{filename}")
+        # Also save locally
+        local_dir = "recordings"
+        os.makedirs(local_dir, exist_ok=True)
+        local_path = os.path.join(local_dir, filename)
+        with open(local_path, 'wb') as f:
+            f.write(response.content)
+        print(f"Recording saved locally: {local_path}")
+        return recording.sid
+    def mute_participant(self, call_id: str, identity: str, track_sid: str, muted: bool):
+        """
+        Mutes the participant on the Twilio call.
+        This prevents audio from reaching the Retell AI.
+        """
+        self.twilio_client.calls(call_id).update(muted=muted)
+    def kick_participant(self, call_id: str, identity: str):
+        """
+        Alias for delete_room (hangup).
+        """
+        self.delete_room(call_id)
+    def send_alert(self, call_id: str, message: str, participant_identity: Optional[str] = None):
+        """
+        Not fully supported in this hybrid model
+        """
+        raise NotImplementedError("send_alert is not currently supported in RetellManager")

intellema_vdk-0.2.0/intellema_vdk/speech_lib/__init__.py ADDED Viewed

	@@ -0,0 +1,2 @@
1	+ from .stt_client import STTManager
2	+ from .tts_streamer import TTSStreamer

intellema-vdk 0.1.0__tar.gz → 0.2.0__tar.gz

intellema-vdk 0.1.0tar.gz → 0.2.0tar.gz