PyPI - vision-agents - Versions diffs - 0.2.3__tar.gz - Mend

vision-agents 0.2.3__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (73) hide show

vision_agents-0.2.3/.gitignore ADDED Viewed

@@ -0,0 +1,90 @@
+# Byte-compiled / optimized / DLL files
+__pycache__/
+*.py[cod]
+*$py.class
+*.so
+.cursor/*
+# Distribution / packaging
+.Python
+build/
+dist/
+downloads/
+develop-eggs/
+eggs/
+.eggs/
+lib64/
+parts/
+sdist/
+var/
+wheels/
+share/python-wheels/
+pip-wheel-metadata/
+MANIFEST
+*.egg-info/
+*.egg
+# Installer logs
+pip-log.txt
+pip-delete-this-directory.txt
+# Unit test / coverage reports
+htmlcov/
+.tox/
+.nox/
+.coverage
+.coverage.*
+.cache
+coverage.xml
+nosetests.xml
+*.cover
+*.py,cover
+.hypothesis/
+.pytest_cache/
+# Type checker / lint caches
+.mypy_cache/
+.dmypy.json
+dmypy.json
+.pytype/
+.pyre/
+.ruff_cache/
+# Environments
+.venv
+env/
+venv/
+ENV/
+env.bak/
+venv.bak/
+.env
+.env.local
+.env.*.local
+.env.bak
+pyvenv.cfg
+.python-version
+# Editors / IDEs
+.vscode/
+.idea/
+# Jupyter Notebook
+.ipynb_checkpoints/
+# OS / Misc
+.DS_Store
+*.log
+# Tooling & repo-specific
+pyrightconfig.json
+shell.nix
+bin/*
+lib/*
+stream-py/
+# Artifacts / assets
+*.pt
+*.kef
+*.onnx
+profile.html
+/opencode.json

vision_agents-0.2.3/PKG-INFO ADDED Viewed

@@ -0,0 +1,91 @@
+Metadata-Version: 2.4
+Name: vision-agents
+Version: 0.2.3
+Summary: Open video agents. Build low latency video and voice agents on any realtime edge network.
+Project-URL: Documentation, https://visionagents.ai/
+Project-URL: Website, https://visionagents.ai/
+Project-URL: Source, https://github.com/GetStream/Vision-Agents
+Keywords: AI,agents,video AI,video agents,voice AI,voice agents
+Classifier: Operating System :: OS Independent
+Classifier: Programming Language :: Python :: 3
+Classifier: Programming Language :: Python :: 3.10
+Classifier: Programming Language :: Python :: 3.11
+Classifier: Programming Language :: Python :: 3.12
+Classifier: Programming Language :: Python :: 3.13
+Classifier: Programming Language :: Python :: Implementation :: CPython
+Requires-Python: >=3.10
+Requires-Dist: colorlog>=6.10.1
+Requires-Dist: getstream[telemetry,webrtc]>=2.5.16
+Requires-Dist: mcp>=1.16.0
+Requires-Dist: numpy>=1.24.0
+Requires-Dist: pillow>=10.4.0
+Requires-Dist: python-dotenv>=1.1.1
+Provides-Extra: all-plugins
+Requires-Dist: vision-agents-plugins-anthropic; extra == 'all-plugins'
+Requires-Dist: vision-agents-plugins-cartesia; extra == 'all-plugins'
+Requires-Dist: vision-agents-plugins-deepgram; extra == 'all-plugins'
+Requires-Dist: vision-agents-plugins-elevenlabs; extra == 'all-plugins'
+Requires-Dist: vision-agents-plugins-gemini; extra == 'all-plugins'
+Requires-Dist: vision-agents-plugins-getstream; extra == 'all-plugins'
+Requires-Dist: vision-agents-plugins-heygen; extra == 'all-plugins'
+Requires-Dist: vision-agents-plugins-inworld; extra == 'all-plugins'
+Requires-Dist: vision-agents-plugins-kokoro; extra == 'all-plugins'
+Requires-Dist: vision-agents-plugins-moonshine; extra == 'all-plugins'
+Requires-Dist: vision-agents-plugins-openai; extra == 'all-plugins'
+Requires-Dist: vision-agents-plugins-roboflow; extra == 'all-plugins'
+Requires-Dist: vision-agents-plugins-smart-turn; extra == 'all-plugins'
+Requires-Dist: vision-agents-plugins-ultralytics; extra == 'all-plugins'
+Requires-Dist: vision-agents-plugins-wizper; extra == 'all-plugins'
+Requires-Dist: vision-agents-plugins-xai; extra == 'all-plugins'
+Provides-Extra: anthropic
+Requires-Dist: vision-agents-plugins-anthropic; extra == 'anthropic'
+Provides-Extra: cartesia
+Requires-Dist: vision-agents-plugins-cartesia; extra == 'cartesia'
+Provides-Extra: deepgram
+Requires-Dist: vision-agents-plugins-deepgram; extra == 'deepgram'
+Provides-Extra: dev
+Requires-Dist: click; extra == 'dev'
+Requires-Dist: mypy; extra == 'dev'
+Requires-Dist: pytest; extra == 'dev'
+Requires-Dist: ruff; extra == 'dev'
+Provides-Extra: elevenlabs
+Requires-Dist: vision-agents-plugins-elevenlabs; extra == 'elevenlabs'
+Provides-Extra: gemini
+Requires-Dist: vision-agents-plugins-gemini; extra == 'gemini'
+Provides-Extra: getstream
+Requires-Dist: vision-agents-plugins-getstream; extra == 'getstream'
+Provides-Extra: heygen
+Requires-Dist: vision-agents-plugins-heygen; extra == 'heygen'
+Provides-Extra: inworld
+Requires-Dist: vision-agents-plugins-inworld; extra == 'inworld'
+Provides-Extra: kokoro
+Requires-Dist: vision-agents-plugins-kokoro; extra == 'kokoro'
+Provides-Extra: moonshine
+Requires-Dist: vision-agents-plugins-moonshine; extra == 'moonshine'
+Provides-Extra: openai
+Requires-Dist: vision-agents-plugins-openai; extra == 'openai'
+Provides-Extra: roboflow
+Requires-Dist: vision-agents-plugins-roboflow; extra == 'roboflow'
+Provides-Extra: smart-turn
+Requires-Dist: vision-agents-plugins-smart-turn; extra == 'smart-turn'
+Provides-Extra: ultralytics
+Requires-Dist: vision-agents-plugins-ultralytics; extra == 'ultralytics'
+Provides-Extra: wizper
+Requires-Dist: vision-agents-plugins-wizper; extra == 'wizper'
+Provides-Extra: xai
+Requires-Dist: vision-agents-plugins-xai; extra == 'xai'
+Description-Content-Type: text/markdown
+# Open Vision Agents by Stream
+Build Vision Agents quickly with any model or video provider.
+-  **Video AI**: Built for real-time video AI. Combine Yolo, Roboflow and others with gemini/openai realtime
+-  **Low Latency**: Join quickly (500ms) and low audio/video latency (30ms)
+-  **Open**: Built by Stream, but use any video edge network that you like
+-  **Native APIs**: Native SDK methods from OpenAI (create response), Gemini (generate) and Claude (create message). So you're never behind on the latest features
+-  **SDKs**: SDKs for React, Android, iOS, Flutter, React, React Native and Unity.
+Created by Stream, uses [Stream's edge network](https://getstream.io/video/) for ultra-low latency.
+See [Github](https://github.com/GetStream/Vision-Agents).

vision_agents-0.2.3/README.md ADDED Viewed

@@ -0,0 +1,13 @@
+# Open Vision Agents by Stream
+Build Vision Agents quickly with any model or video provider.
+-  **Video AI**: Built for real-time video AI. Combine Yolo, Roboflow and others with gemini/openai realtime
+-  **Low Latency**: Join quickly (500ms) and low audio/video latency (30ms)
+-  **Open**: Built by Stream, but use any video edge network that you like
+-  **Native APIs**: Native SDK methods from OpenAI (create response), Gemini (generate) and Claude (create message). So you're never behind on the latest features
+-  **SDKs**: SDKs for React, Android, iOS, Flutter, React, React Native and Unity.
+Created by Stream, uses [Stream's edge network](https://getstream.io/video/) for ultra-low latency.
+See [Github](https://github.com/GetStream/Vision-Agents).

vision_agents-0.2.3/pyproject.toml ADDED Viewed

@@ -0,0 +1,91 @@
+[build-system]
+requires = ["hatchling", "hatch-vcs", "setuptools-scm"]
+build-backend = "hatchling.build"
+[project]
+name = "vision-agents"
+description = "Open video agents. Build low latency video and voice agents on any realtime edge network."
+readme = "README.md"
+keywords = ["video AI", "AI", "voice AI", "video agents", "voice agents", "agents", "AI"]
+dynamic = ["version"]
+classifiers = [
+    "Programming Language :: Python :: 3",
+    "Programming Language :: Python :: 3.10",
+    "Programming Language :: Python :: 3.11",
+    "Programming Language :: Python :: 3.12",
+    "Programming Language :: Python :: 3.13",
+    "Programming Language :: Python :: Implementation :: CPython",
+    "Operating System :: OS Independent",
+]
+requires-python = ">=3.10"
+dependencies = [
+    "getstream[webrtc,telemetry]>=2.5.16",
+    "python-dotenv>=1.1.1",
+    "pillow>=10.4.0",  # Compatible with moondream SDK (<11.0.0)
+    "numpy>=1.24.0",
+    "mcp>=1.16.0",
+    "colorlog>=6.10.1",
+]
+[project.urls]
+Documentation = "https://visionagents.ai/"
+Website = "https://visionagents.ai/"
+Source = "https://github.com/GetStream/Vision-Agents"
+[project.optional-dependencies]
+dev = ["pytest", "mypy", "ruff", "click"]
+anthropic = ["vision-agents-plugins-anthropic"]
+cartesia = ["vision-agents-plugins-cartesia"]
+deepgram = ["vision-agents-plugins-deepgram"]
+elevenlabs = ["vision-agents-plugins-elevenlabs"]
+gemini = ["vision-agents-plugins-gemini"]
+getstream = ["vision-agents-plugins-getstream"]
+heygen = ["vision-agents-plugins-heygen"]
+inworld = ["vision-agents-plugins-inworld"]
+kokoro = ["vision-agents-plugins-kokoro"]
+moonshine = ["vision-agents-plugins-moonshine"]
+openai = ["vision-agents-plugins-openai"]
+roboflow = ["vision-agents-plugins-roboflow"]
+smart_turn = ["vision-agents-plugins-smart-turn"]
+ultralytics = ["vision-agents-plugins-ultralytics"]
+wizper = ["vision-agents-plugins-wizper"]
+xai = ["vision-agents-plugins-xai"]
+all-plugins = [
+  "vision-agents-plugins-anthropic",
+  "vision-agents-plugins-cartesia",
+  "vision-agents-plugins-deepgram",
+  "vision-agents-plugins-elevenlabs",
+  "vision-agents-plugins-gemini",
+  "vision-agents-plugins-getstream",
+  "vision-agents-plugins-heygen",
+  "vision-agents-plugins-inworld",
+  "vision-agents-plugins-kokoro",
+  "vision-agents-plugins-moonshine",
+  "vision-agents-plugins-roboflow",
+  "vision-agents-plugins-openai",
+  "vision-agents-plugins-smart-turn",
+  "vision-agents-plugins-ultralytics",
+  "vision-agents-plugins-wizper",
+  "vision-agents-plugins-xai",
+]
+[tool.hatch.metadata]
+allow-direct-references = true
+[tool.hatch.version]
+source = "vcs"
+raw-options = { root = "..", search_parent_directories = true, fallback_version = "0.0.0" }
+[tool.hatch.build.targets.wheel]
+packages = ["vision_agents"]
+[tool.hatch.build.targets.sdist]
+include = ["vision_agents"]
+#[tool.uv.sources]
+# getstream = { git = "https://github.com/GetStream/stream-py.git", branch = "audio-more" }
+# for local development
+# getstream = { path = "../../stream-py/", editable = true }
+# aiortc = { path = "../stream-py/", editable = true }

vision_agents-0.2.3/vision_agents/PROTOBUF_GENERATION.md ADDED Viewed

@@ -0,0 +1,286 @@
+# Protobuf Event Generation
+## Overview
+The `_generate_sfu_events.py` script automatically generates Python dataclass wrappers for protobuf messages from the SFU (Selective Forwarding Unit) event system. These generated classes inherit from `BaseEvent` and provide type-safe access to protobuf fields with all fields being optional.
+## Location
+- **Generator Script**: `agents-core/vision_agents/_generate_sfu_events.py`
+- **Generated Output**: `agents-core/vision_agents/core/edge/sfu_events.py`
+## Key Features
+### 1. BaseEvent Inheritance
+All generated classes inherit from `BaseEvent`, providing:
+- `type`: Event type identifier (auto-set from protobuf full name)
+- `event_id`: Unique identifier (auto-generated UUID)
+- `timestamp`: Event creation time (auto-generated)
+- `session_id`: Optional session identifier
+- `user_metadata`: Optional user metadata
+### 2. Optional Fields
+All fields are optional, allowing event creation without a payload:
+```python
+event = AudioLevelEvent()  # All fields are optional
+```
+### 3. Advanced Type Mapping
+The generator uses `_get_python_type_from_protobuf_field()` to map protobuf types to Python types with **full nested message type resolution**:
+- Protobuf scalar types → Python primitives (int, float, str, bool, bytes)
+- Protobuf repeated fields → `Optional[List[T]]`
+- **Protobuf message types → Proper typed dataclass wrappers** (e.g., `Optional[Participant]`)
+- Protobuf enum types → `Optional[int]`
+All types are wrapped in `Optional` for flexibility.
+#### Message Type Wrappers
+The generator automatically creates dataclass wrappers for all protobuf message types used in events. These wrappers:
+- Are placed at the top of the generated file
+- Include all fields with proper Python types
+- Support nested message types recursively
+- Provide `from_proto()` class method for conversion
+- Are fully typed for IDE autocomplete and type checking
+Example:
+```python
+@dataclass
+class Participant(DataClassJsonMixin):
+    """Wrapper for stream.video.sfu.models.Participant."""
+    user_id: Optional[str] = None
+    session_id: Optional[str] = None
+    name: Optional[str] = None
+    is_speaking: Optional[bool] = None
+    audio_level: Optional[float] = None
+    # ... all other fields
+    @classmethod
+    def from_proto(cls, proto_obj) -> 'Participant':
+        """Create from protobuf Participant."""
+        # ... conversion logic
+```
+### 4. Property-Based Access
+Protobuf fields are exposed as properties with proper type hints:
+```python
+@property
+def user_id(self) -> Optional[str]:
+    """Access user_id field from the protobuf payload."""
+    if self.payload is None:
+        return None
+    return getattr(self.payload, 'user_id', None)
+```
+### 5. Protobuf Integration
+Each generated class provides:
+- `from_proto(proto_obj)`: Create event from protobuf message
+- `as_dict()`: Convert protobuf payload to dictionary
+- `__getattr__()`: Delegate attribute access to protobuf payload
+## Usage
+### Regenerating Events
+```bash
+cd agents-core
+uv run python vision_agents/_generate_sfu_events.py
+```
+### Verification
+Verify type mappings and generated classes:
+```bash
+# Show type mappings
+uv run python vision_agents/_generate_sfu_events.py --verify-types
+# Verify generated classes
+uv run python vision_agents/_generate_sfu_events.py --verify
+# Both
+uv run python vision_agents/_generate_sfu_events.py --verify-types --verify
+```
+### Example Usage
+```python
+from vision_agents.core.edge.sfu_events import (
+    AudioLevelEvent,
+    TrackUnpublishedEvent,
+    Participant  # Now properly typed!
+)
+from getstream.video.rtc.pb.stream.video.sfu.event import events_pb2
+from getstream.video.rtc.pb.stream.video.sfu.models import models_pb2
+# Example 1: Simple event without payload
+event1 = AudioLevelEvent()
+print(event1.user_id)  # None
+# Example 2: Event from protobuf
+proto = events_pb2.AudioLevel(user_id='user123', level=0.85, is_speaking=True)
+event2 = AudioLevelEvent.from_proto(proto)
+print(event2.user_id)        # 'user123'
+print(event2.level)          # 0.85
+print(event2.is_speaking)    # True
+print(event2.as_dict())      # {'user_id': 'user123', 'level': 0.85, 'is_speaking': True}
+# Example 3: Event with nested message type (Participant)
+proto_participant = models_pb2.Participant(
+    user_id='user456',
+    name='John Doe',
+    is_speaking=True,
+    audio_level=0.92
+)
+proto_track = events_pb2.TrackUnpublished(
+    user_id='user456',
+    participant=proto_participant
+)
+event3 = TrackUnpublishedEvent.from_proto(proto_track)
+# Participant is properly typed as Participant dataclass!
+participant: Participant = event3.participant  # Type-safe!
+print(participant.user_id)     # 'user456'
+print(participant.name)        # 'John Doe'
+print(participant.is_speaking) # True
+print(participant.audio_level) # 0.92
+# IDE autocomplete works perfectly for all Participant fields!
+```
+## Verification Functions
+### `_get_python_type_from_protobuf_field(field_descriptor)`
+Determines the appropriate Python type annotation from a protobuf field descriptor. Maps protobuf types to Python types with proper handling of:
+- Scalar types (int, float, str, bool, bytes)
+- Repeated fields (lists)
+- Message types (nested protobuf messages)
+- Enum types
+### `verify_field_types()`
+Displays a comprehensive report of all field type mappings for verification:
+```
+AudioLevelEvent (AudioLevel):
+  Protobuf type: stream.video.sfu.event.AudioLevel
+  - user_id: type=9 (required) → Optional[str]
+  - level: type=2 (required) → Optional[float]
+  - is_speaking: type=8 (required) → Optional[bool]
+```
+### `verify_generated_classes()`
+Verifies that generated classes match protobuf definitions by checking:
+- Class exists in generated module
+- All protobuf fields are accessible as properties
+- Properties have correct types
+- No missing or incorrect field mappings
+## Generated Class Structure
+Each generated class follows this pattern:
+```python
+@dataclass
+class AudioLevelEvent(BaseEvent):
+    """Dataclass event for video.sfu.event.events_pb2.AudioLevel."""
+    type: str = field(default="stream.video.sfu.event.AudioLevel", init=False)
+    payload: Optional[events_pb2.AudioLevel] = field(default=None, repr=False)
+    @property
+    def user_id(self) -> Optional[str]:
+        """Access user_id field from the protobuf payload."""
+        if self.payload is None:
+            return None
+        return getattr(self.payload, 'user_id', None)
+    # ... more properties ...
+    @classmethod
+    def from_proto(cls, proto_obj: events_pb2.AudioLevel, **extra):
+        """Create event instance from protobuf message."""
+        return cls(payload=proto_obj, **extra)
+    def as_dict(self) -> Dict[str, Any]:
+        """Convert protobuf payload to dictionary."""
+        if self.payload is None:
+            return {}
+        return _to_dict(self.payload)
+    def __getattr__(self, item: str):
+        """Delegate attribute access to protobuf payload."""
+        if self.payload is not None:
+            return getattr(self.payload, item)
+        raise AttributeError(f"'{self.__class__.__name__}' object has no attribute '{item}'")
+```
+## Import Strategy
+The edge module uses absolute imports instead of relative imports to avoid naming conflicts with standard library modules (specifically avoiding conflicts with Python's `types` module).
+```python
+# In edge/__init__.py
+from vision_agents.core.edge.edge_transport import EdgeTransport
+from vision_agents.core.edge import sfu_events
+```
+## Event Manager Integration
+The EventManager has been updated to seamlessly handle the new protobuf events:
+### How It Works
+1. **Register protobuf event classes** like any other event:
+   ```python
+   from vision_agents.core.events.manager import EventManager
+   from vision_agents.core.edge.sfu_events import AudioLevelEvent
+   manager = EventManager()
+   manager.register(AudioLevelEvent)
+   ```
+2. **Send events** in three ways:
+   - Send wrapped events (already BaseEvent):
+     ```python
+     proto = events_pb2.AudioLevel(user_id='user123', level=0.85)
+     event = AudioLevelEvent.from_proto(proto, session_id='session123')
+     manager.send(event)  # BaseEvent fields preserved
+     ```
+   - Send raw protobuf messages (auto-wrapped):
+     ```python
+     proto = events_pb2.AudioLevel(user_id='user456', level=0.95)
+     manager.send(proto)  # Automatically wrapped in AudioLevelEvent
+     ```
+   - Create events without payload (all fields optional):
+     ```python
+     event = AudioLevelEvent()  # No protobuf payload needed
+     manager.send(event)
+     ```
+3. **Subscribe to protobuf events** like any other event:
+   ```python
+   @manager.subscribe
+   async def handle_audio(event: AudioLevelEvent):
+       print(f"User: {event.user_id}, Level: {event.level}")
+       print(f"Session: {event.session_id}, ID: {event.event_id}")
+   ```
+### Key Improvements
+- **No double-wrapping**: Already-wrapped BaseEvent subclasses are not re-wrapped
+- **BaseEvent fields preserved**: session_id, event_id, timestamp all work correctly
+- **Simplified logic**: Single check distinguishes raw protobuf from wrapped events
+- **Type safety**: All generated events properly inherit from BaseEvent
+- **Flexible usage**: Use raw protobuf or wrapped events interchangeably