PyPI - srx-lib-azure - Versions diffs - 0.1.8__tar.gz → 0.3.0__tar.gz - Mend

srx-lib-azure 0.1.8tar.gz → 0.3.0tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (14) hide show

srx_lib_azure-0.3.0/PKG-INFO ADDED Viewed

@@ -0,0 +1,134 @@
+Metadata-Version: 2.4
+Name: srx-lib-azure
+Version: 0.3.0
+Summary: Azure helpers for SRX services: Blob, Email, Table, Document Intelligence, Speech Services
+Author-email: SRX <dev@srx.id>
+Requires-Python: >=3.12
+Requires-Dist: azure-ai-documentintelligence>=1.0.0
+Requires-Dist: azure-communication-email>=1.0.0
+Requires-Dist: azure-data-tables>=12.7.0
+Requires-Dist: azure-storage-blob>=12.22.0
+Requires-Dist: loguru>=0.7.2
+Provides-Extra: all
+Requires-Dist: azure-ai-documentintelligence>=1.0.0; extra == 'all'
+Requires-Dist: azure-cognitiveservices-speech>=1.41.1; extra == 'all'
+Provides-Extra: document
+Requires-Dist: azure-ai-documentintelligence>=1.0.0; extra == 'document'
+Provides-Extra: speech
+Requires-Dist: azure-cognitiveservices-speech>=1.41.1; extra == 'speech'
+Description-Content-Type: text/markdown
+# srx-lib-azure
+Lightweight wrappers over Azure SDKs used across SRX services.
+What it includes:
+- **Blob**: upload/download helpers, SAS URL generation
+- **Email** (Azure Communication Services): simple async sender
+- **Table**: simple CRUD helpers
+- **Document Intelligence** (OCR): document analysis from URLs or bytes
+## Install
+PyPI (public):
+- `pip install srx-lib-azure`
+uv (pyproject):
+```
+[project]
+dependencies = ["srx-lib-azure>=0.1.0"]
+```
+## Usage
+Blob:
+```
+from srx_lib_azure.blob import AzureBlobService
+blob = AzureBlobService()
+url = await blob.upload_file(upload_file, "documents/report.pdf")
+```
+Email:
+```
+from srx_lib_azure.email import EmailService
+svc = EmailService()
+await svc.send_notification("user@example.com", "Subject", "Hello", html=False)
+```
+Table:
+```
+from srx_lib_azure.table import AzureTableService
+store = AzureTableService()
+store.ensure_table("events")
+store.upsert_entity("events", {"PartitionKey":"p","RowKey":"r","EventType":"x"})
+```
+Document Intelligence (OCR):
+```python
+from srx_lib_azure import AzureDocumentIntelligenceService
+# Initialize with endpoint and key
+doc_service = AzureDocumentIntelligenceService(
+    endpoint="https://your-resource.cognitiveservices.azure.com/",
+    key="your-api-key"
+)
+# Analyze document from URL
+result = await doc_service.analyze_document_from_url(
+    url="https://example.com/document.pdf",
+    model_id="prebuilt-read"  # or "prebuilt-layout", "prebuilt-invoice", etc.
+)
+# Analyze document from bytes
+with open("document.pdf", "rb") as f:
+    content = f.read()
+result = await doc_service.analyze_document_from_bytes(
+    file_content=content,
+    model_id="prebuilt-read"
+)
+# Result structure:
+# {
+#     "success": True/False,
+#     "content": "extracted text...",
+#     "pages": [{"page_number": 1, "width": 8.5, ...}, ...],
+#     "page_count": 10,
+#     "confidence": 0.98,
+#     "model_id": "prebuilt-read",
+#     "metadata": {...},
+#     "error": None  # or error message if failed
+# }
+```
+## Environment Variables
+- **Blob & Table**: `AZURE_STORAGE_CONNECTION_STRING` (required)
+- **Email (ACS)**: `ACS_CONNECTION_STRING`, `EMAIL_SENDER`
+- **Document Intelligence**: `AZURE_DOCUMENT_INTELLIGENCE_ENDPOINT`, `AZURE_DOCUMENT_INTELLIGENCE_KEY`
+- **Optional**: `AZURE_STORAGE_ACCOUNT_KEY`, `AZURE_BLOB_URL`, `AZURE_SAS_TOKEN`
+## Optional Dependencies
+All services are optional and won't break if their dependencies aren't installed:
+```bash
+# Base installation (includes all services by default)
+pip install srx-lib-azure
+# Or install only what you need - document intelligence is optional
+pip install srx-lib-azure[document]  # Adds Document Intelligence support
+# Install with all optional dependencies
+pip install srx-lib-azure[all]
+```
+If you import a service without its required Azure SDK, it will log a warning but won't crash.
+## Release
+Tag `vX.Y.Z` to publish to GitHub Packages via Actions.
+## License
+Proprietary © SRX

srx_lib_azure-0.3.0/README.md ADDED Viewed

@@ -0,0 +1,114 @@
+# srx-lib-azure
+Lightweight wrappers over Azure SDKs used across SRX services.
+What it includes:
+- **Blob**: upload/download helpers, SAS URL generation
+- **Email** (Azure Communication Services): simple async sender
+- **Table**: simple CRUD helpers
+- **Document Intelligence** (OCR): document analysis from URLs or bytes
+## Install
+PyPI (public):
+- `pip install srx-lib-azure`
+uv (pyproject):
+```
+[project]
+dependencies = ["srx-lib-azure>=0.1.0"]
+```
+## Usage
+Blob:
+```
+from srx_lib_azure.blob import AzureBlobService
+blob = AzureBlobService()
+url = await blob.upload_file(upload_file, "documents/report.pdf")
+```
+Email:
+```
+from srx_lib_azure.email import EmailService
+svc = EmailService()
+await svc.send_notification("user@example.com", "Subject", "Hello", html=False)
+```
+Table:
+```
+from srx_lib_azure.table import AzureTableService
+store = AzureTableService()
+store.ensure_table("events")
+store.upsert_entity("events", {"PartitionKey":"p","RowKey":"r","EventType":"x"})
+```
+Document Intelligence (OCR):
+```python
+from srx_lib_azure import AzureDocumentIntelligenceService
+# Initialize with endpoint and key
+doc_service = AzureDocumentIntelligenceService(
+    endpoint="https://your-resource.cognitiveservices.azure.com/",
+    key="your-api-key"
+)
+# Analyze document from URL
+result = await doc_service.analyze_document_from_url(
+    url="https://example.com/document.pdf",
+    model_id="prebuilt-read"  # or "prebuilt-layout", "prebuilt-invoice", etc.
+)
+# Analyze document from bytes
+with open("document.pdf", "rb") as f:
+    content = f.read()
+result = await doc_service.analyze_document_from_bytes(
+    file_content=content,
+    model_id="prebuilt-read"
+)
+# Result structure:
+# {
+#     "success": True/False,
+#     "content": "extracted text...",
+#     "pages": [{"page_number": 1, "width": 8.5, ...}, ...],
+#     "page_count": 10,
+#     "confidence": 0.98,
+#     "model_id": "prebuilt-read",
+#     "metadata": {...},
+#     "error": None  # or error message if failed
+# }
+```
+## Environment Variables
+- **Blob & Table**: `AZURE_STORAGE_CONNECTION_STRING` (required)
+- **Email (ACS)**: `ACS_CONNECTION_STRING`, `EMAIL_SENDER`
+- **Document Intelligence**: `AZURE_DOCUMENT_INTELLIGENCE_ENDPOINT`, `AZURE_DOCUMENT_INTELLIGENCE_KEY`
+- **Optional**: `AZURE_STORAGE_ACCOUNT_KEY`, `AZURE_BLOB_URL`, `AZURE_SAS_TOKEN`
+## Optional Dependencies
+All services are optional and won't break if their dependencies aren't installed:
+```bash
+# Base installation (includes all services by default)
+pip install srx-lib-azure
+# Or install only what you need - document intelligence is optional
+pip install srx-lib-azure[document]  # Adds Document Intelligence support
+# Install with all optional dependencies
+pip install srx-lib-azure[all]
+```
+If you import a service without its required Azure SDK, it will log a warning but won't crash.
+## Release
+Tag `vX.Y.Z` to publish to GitHub Packages via Actions.
+## License
+Proprietary © SRX

{srx_lib_azure-0.1.8 → srx_lib_azure-0.3.0}/pyproject.toml RENAMED Viewed

@@ -4,8 +4,8 @@ build-backend = "hatchling.build"
 [project]
 name = "srx-lib-azure"
-version = "0.1.8"
-description = "Azure helpers for SRX services: Blob, Email, Table"
+version = "0.3.0"
+description = "Azure helpers for SRX services: Blob, Email, Table, Document Intelligence, Speech Services"
 readme = "README.md"
 requires-python = ">=3.12"
 authors = [{ name = "SRX", email = "dev@srx.id" }]
@@ -14,6 +14,22 @@ dependencies = [
   "azure-storage-blob>=12.22.0",
   "azure-communication-email>=1.0.0",
   "azure-data-tables>=12.7.0",
+  "azure-ai-documentintelligence>=1.0.0",
+]
+[project.optional-dependencies]
+# Optional extra for Document Intelligence (OCR)
+document = [
+  "azure-ai-documentintelligence>=1.0.0",
+]
+# Optional extra for Speech Services (audio transcription)
+speech = [
+  "azure-cognitiveservices-speech>=1.41.1",
+]
+# Install all optional dependencies
+all = [
+  "azure-ai-documentintelligence>=1.0.0",
+  "azure-cognitiveservices-speech>=1.41.1",
 ]
 [tool.hatch.build.targets.wheel]

srx_lib_azure-0.3.0/src/srx_lib_azure/__init__.py ADDED Viewed

@@ -0,0 +1,23 @@
+from .blob import AzureBlobService
+from .document import AzureDocumentIntelligenceService
+from .email import EmailService
+from .table import AzureTableService
+# Optional import - only available if speech extra is installed
+try:
+    from .speech import AzureSpeechService
+    __all__ = [
+        "AzureBlobService",
+        "AzureDocumentIntelligenceService",
+        "AzureTableService",
+        "EmailService",
+        "AzureSpeechService",
+    ]
+except ImportError:
+    # Speech SDK not installed - service not available
+    __all__ = [
+        "AzureBlobService",
+        "AzureDocumentIntelligenceService",
+        "AzureTableService",
+        "EmailService",
+    ]

srx_lib_azure-0.3.0/src/srx_lib_azure/document.py ADDED Viewed

@@ -0,0 +1,262 @@
+import os
+import io
+import asyncio
+from typing import Dict, Any, Optional
+from loguru import logger
+try:
+    from azure.ai.documentintelligence import DocumentIntelligenceClient
+    from azure.ai.documentintelligence.models import AnalyzeDocumentRequest, AnalyzeResult
+    from azure.core.credentials import AzureKeyCredential
+    from azure.core.exceptions import (
+        ClientAuthenticationError,
+        HttpResponseError,
+        ServiceRequestError,
+    )
+except Exception:  # pragma: no cover - optional dependency at import time
+    DocumentIntelligenceClient = None  # type: ignore
+    AnalyzeDocumentRequest = None  # type: ignore
+    AnalyzeResult = None  # type: ignore
+    AzureKeyCredential = None  # type: ignore
+    ClientAuthenticationError = None  # type: ignore
+    HttpResponseError = None  # type: ignore
+    ServiceRequestError = None  # type: ignore
+class AzureDocumentIntelligenceService:
+    """Wrapper for Azure Document Intelligence (OCR/Document Analysis).
+    Does not raise on missing configuration to keep the library optional.
+    If not configured, analysis calls return error responses with descriptive messages.
+    """
+    def __init__(
+        self,
+        *,
+        endpoint: Optional[str] = None,
+        key: Optional[str] = None,
+        warn_if_unconfigured: bool = False,
+    ):
+        """Initialize Document Intelligence service.
+        Args:
+            endpoint: Azure Document Intelligence endpoint URL
+            key: Azure Document Intelligence API key
+            warn_if_unconfigured: Whether to log a warning if not configured
+        """
+        self.endpoint = endpoint or os.getenv("AZURE_DOCUMENT_INTELLIGENCE_ENDPOINT")
+        self.key = key or os.getenv("AZURE_DOCUMENT_INTELLIGENCE_KEY")
+        if not self.endpoint or not self.key or DocumentIntelligenceClient is None:
+            self.client = None
+            if warn_if_unconfigured:
+                logger.warning(
+                    "AzureDocumentIntelligenceService not configured "
+                    "(missing endpoint/key or azure-ai-documentintelligence SDK). "
+                    "Calls will return error responses."
+                )
+        else:
+            try:
+                self.client = DocumentIntelligenceClient(
+                    endpoint=self.endpoint, credential=AzureKeyCredential(self.key)
+                )
+            except Exception as e:
+                self.client = None
+                logger.warning("DocumentIntelligenceClient initialization failed: %s", e)
+    async def analyze_document_from_url(
+        self, url: str, model_id: str = "prebuilt-read"
+    ) -> Dict[str, Any]:
+        """Analyze a document from a URL using Azure Document Intelligence.
+        Args:
+            url: URL of the document to analyze (must be accessible to Azure)
+            model_id: Model to use (default: "prebuilt-read" for OCR)
+                     Other options: "prebuilt-layout", "prebuilt-invoice", etc.
+        Returns:
+            Dict with analysis results:
+            - success (bool): Whether analysis succeeded
+            - content (str | None): Extracted text content
+            - pages (list[dict] | None): Page information
+            - page_count (int | None): Total number of pages
+            - confidence (float | None): Average OCR confidence (0-1)
+            - model_id (str | None): Model used
+            - metadata (dict | None): Additional metadata
+            - error (str | None): Error message if failed
+        """
+        if not self.client:
+            logger.warning("Document analysis from URL skipped: service not configured")
+            return {
+                "success": False,
+                "error": "Document Intelligence service not configured",
+            }
+        try:
+            logger.info(f"Starting document analysis from URL: {url} (model: {model_id})")
+            # Run the blocking operation in a thread pool
+            poller = await asyncio.to_thread(
+                self.client.begin_analyze_document,
+                model_id,
+                AnalyzeDocumentRequest(url_source=url),
+            )
+            # Wait for the result
+            result: AnalyzeResult = await asyncio.to_thread(poller.result)
+            logger.info(
+                f"Document analysis completed (model: {model_id}, pages: {len(result.pages or [])})"
+            )
+            return self._format_result(result, model_id)
+        except ClientAuthenticationError as e:
+            logger.error(f"Authentication failed for document analysis: {e}")
+            return {"success": False, "error": f"Authentication failed: {e}"}
+        except HttpResponseError as e:
+            logger.error(f"Azure service error analyzing document: {e.status_code} - {e.message}")
+            return {
+                "success": False,
+                "error": f"Azure service error ({e.status_code}): {e.message}",
+            }
+        except ServiceRequestError as e:
+            logger.error(f"Network error analyzing document: {e}")
+            return {"success": False, "error": f"Network error: {e}"}
+        except Exception as e:
+            logger.error(f"Unexpected error analyzing document from URL: {e}")
+            return {"success": False, "error": f"Unexpected error: {e}"}
+    async def analyze_document_from_bytes(
+        self, file_content: bytes, model_id: str = "prebuilt-read"
+    ) -> Dict[str, Any]:
+        """Analyze a document from bytes using Azure Document Intelligence.
+        Args:
+            file_content: Document content as bytes (PDF, image, etc.)
+            model_id: Model to use (default: "prebuilt-read" for OCR)
+        Returns:
+            Dict with analysis results (same format as analyze_document_from_url)
+        """
+        if not self.client:
+            logger.warning("Document analysis from bytes skipped: service not configured")
+            return {
+                "success": False,
+                "error": "Document Intelligence service not configured",
+            }
+        try:
+            logger.info(
+                f"Starting document analysis from bytes (size: {len(file_content)} bytes, model: {model_id})"
+            )
+            # Create a file-like object from bytes
+            file_stream = io.BytesIO(file_content)
+            # Run the blocking operation in a thread pool
+            poller = await asyncio.to_thread(
+                self.client.begin_analyze_document,
+                model_id,
+                body=file_stream,
+            )
+            # Wait for the result
+            result: AnalyzeResult = await asyncio.to_thread(poller.result)
+            logger.info(
+                f"Document analysis completed (model: {model_id}, pages: {len(result.pages or [])})"
+            )
+            return self._format_result(result, model_id)
+        except ClientAuthenticationError as e:
+            logger.error(f"Authentication failed for document analysis: {e}")
+            return {"success": False, "error": f"Authentication failed: {e}"}
+        except HttpResponseError as e:
+            logger.error(f"Azure service error analyzing document: {e.status_code} - {e.message}")
+            return {
+                "success": False,
+                "error": f"Azure service error ({e.status_code}): {e.message}",
+            }
+        except ServiceRequestError as e:
+            logger.error(f"Network error analyzing document: {e}")
+            return {"success": False, "error": f"Network error: {e}"}
+        except Exception as e:
+            logger.error(f"Unexpected error analyzing document from bytes: {e}")
+            return {"success": False, "error": f"Unexpected error: {e}"}
+    def _format_result(self, result: AnalyzeResult, model_id: str) -> Dict[str, Any]:
+        """Format the AnalyzeResult into a dict response.
+        Args:
+            result: Azure Document Intelligence AnalyzeResult
+            model_id: Model ID used for analysis
+        Returns:
+            Formatted dict with extracted content and metadata
+        """
+        # Extract all text content
+        content_parts: list[str] = []
+        pages_info: list[Dict[str, Any]] = []
+        total_confidence = 0.0
+        confidence_count = 0
+        if result.pages:
+            for page in result.pages:
+                # Collect page info
+                page_info = {
+                    "page_number": page.page_number,
+                    "width": page.width,
+                    "height": page.height,
+                    "unit": page.unit,
+                    "lines_count": len(page.lines or []),
+                    "words_count": len(page.words or []),
+                }
+                pages_info.append(page_info)
+                # Extract text from lines
+                if page.lines:
+                    for line in page.lines:
+                        content_parts.append(line.content)
+                        # Track confidence if available
+                        if hasattr(line, "confidence") and line.confidence is not None:
+                            total_confidence += line.confidence
+                            confidence_count += 1
+        # Combine all content with newlines
+        full_content = "\n".join(content_parts)
+        # Calculate average confidence
+        avg_confidence = total_confidence / confidence_count if confidence_count > 0 else None
+        # Build metadata
+        metadata: Dict[str, Any] = {
+            "content_format": (
+                result.content_format if hasattr(result, "content_format") else None
+            ),
+            "api_version": result.api_version if hasattr(result, "api_version") else None,
+        }
+        # Add languages if detected
+        if hasattr(result, "languages") and result.languages:
+            metadata["languages"] = [
+                {"locale": lang.locale, "confidence": lang.confidence} for lang in result.languages
+            ]
+        # Add styles if detected (e.g., handwriting)
+        if hasattr(result, "styles") and result.styles:
+            metadata["has_handwriting"] = any(
+                style.is_handwritten for style in result.styles if hasattr(style, "is_handwritten")
+            )
+        return {
+            "success": True,
+            "content": full_content if full_content else None,
+            "pages": pages_info if pages_info else None,
+            "page_count": len(pages_info) if pages_info else None,
+            "confidence": avg_confidence,
+            "model_id": model_id,
+            "metadata": metadata,
+        }

srx_lib_azure-0.3.0/src/srx_lib_azure/speech.py ADDED Viewed

@@ -0,0 +1,296 @@
+import asyncio
+import os
+import subprocess
+import tempfile
+from pathlib import Path
+from typing import Optional
+from loguru import logger
+# Optional import - gracefully handle if azure-cognitiveservices-speech is not installed
+try:
+    import azure.cognitiveservices.speech as speechsdk
+    SPEECH_SDK_AVAILABLE = True
+except ImportError:
+    SPEECH_SDK_AVAILABLE = False
+    logger.warning(
+        "azure-cognitiveservices-speech not installed. "
+        "Install with: pip install srx-lib-azure[speech]"
+    )
+class AzureSpeechService:
+    """Azure Speech Service for audio transcription.
+    Provides audio-to-text transcription using Azure Cognitive Services Speech SDK.
+    Supports continuous recognition for longer audio files and language selection.
+    Configuration can be passed explicitly via constructor or fallback to environment variables.
+    Operations will error if SDK is not installed or required credentials are missing.
+    """
+    def __init__(
+        self,
+        *,
+        speech_key: Optional[str] = None,
+        speech_region: Optional[str] = None,
+        speech_endpoint: Optional[str] = None,
+        warn_if_unconfigured: bool = False,
+    ) -> None:
+        """Initialize Azure Speech Service.
+        Args:
+            speech_key: Azure Speech API key (falls back to AZURE_SPEECH_KEY env var)
+            speech_region: Azure region (falls back to AZURE_SPEECH_REGION env var)
+            speech_endpoint: Optional custom endpoint (falls back to AZURE_SPEECH_ENDPOINT env var)
+            warn_if_unconfigured: Whether to warn at initialization if not configured
+        """
+        self.speech_key = speech_key or os.getenv("AZURE_SPEECH_KEY")
+        self.speech_region = speech_region or os.getenv("AZURE_SPEECH_REGION")
+        self.speech_endpoint = speech_endpoint or os.getenv("AZURE_SPEECH_ENDPOINT")
+        if warn_if_unconfigured and not self.speech_key:
+            logger.warning(
+                "Azure Speech credentials not configured; transcription operations may fail."
+            )
+    def _check_availability(self) -> None:
+        """Check if Speech SDK is available and credentials are configured."""
+        if not SPEECH_SDK_AVAILABLE:
+            raise RuntimeError(
+                "azure-cognitiveservices-speech package not installed. "
+                "Install with: pip install srx-lib-azure[speech]"
+            )
+        if not self.speech_key:
+            raise RuntimeError(
+                "Azure Speech credentials not configured. "
+                "Provide speech_key or set AZURE_SPEECH_KEY environment variable."
+            )
+        if not self.speech_region and not self.speech_endpoint:
+            raise RuntimeError(
+                "Azure Speech region or endpoint not configured. "
+                "Provide speech_region or speech_endpoint, or set AZURE_SPEECH_REGION environment variable."
+            )
+    def _preprocess_audio(self, input_path: str) -> str:
+        """Convert audio to 16kHz mono WAV format for optimal Azure Speech processing.
+        Args:
+            input_path: Path to input audio file
+        Returns:
+            Path to preprocessed WAV file
+        Raises:
+            RuntimeError: If ffmpeg is not available or conversion fails
+        """
+        try:
+            # Check if ffmpeg is available
+            subprocess.run(
+                ["ffmpeg", "-version"],
+                capture_output=True,
+                check=True,
+            )
+        except (subprocess.CalledProcessError, FileNotFoundError) as e:
+            raise RuntimeError(
+                "ffmpeg not found. Please install ffmpeg for audio preprocessing."
+            ) from e
+        # Create temporary WAV file
+        with tempfile.NamedTemporaryFile(suffix=".wav", delete=False) as tf:
+            output_path = tf.name
+        try:
+            # Convert to 16kHz mono WAV
+            subprocess.run(
+                [
+                    "ffmpeg",
+                    "-i",
+                    input_path,
+                    "-ar",
+                    "16000",  # 16kHz sample rate
+                    "-ac",
+                    "1",  # Mono
+                    "-y",  # Overwrite output file
+                    output_path,
+                ],
+                capture_output=True,
+                check=True,
+            )
+            logger.info(f"Preprocessed audio: {input_path} -> {output_path}")
+            return output_path
+        except subprocess.CalledProcessError as e:
+            # Clean up on error
+            if os.path.exists(output_path):
+                os.unlink(output_path)
+            raise RuntimeError(f"Audio preprocessing failed: {e.stderr.decode()}") from e
+    async def transcribe_audio_to_markdown(
+        self,
+        audio_path: str,
+        language: str = "id-ID",
+        preprocess: bool = True,
+    ) -> str:
+        """Transcribe audio file to markdown-formatted text.
+        Args:
+            audio_path: Path to audio file (mp3, m4a, wav, etc.)
+            language: BCP-47 language code (default: 'id-ID' for Indonesian)
+                     Common codes: 'en-US', 'id-ID', 'ms-MY', 'zh-CN', 'ja-JP'
+            preprocess: Whether to preprocess audio to 16kHz mono WAV (recommended)
+        Returns:
+            Markdown-formatted transcription text
+        Raises:
+            RuntimeError: If SDK not available, credentials missing, or transcription fails
+        """
+        self._check_availability()
+        # Preprocess audio if requested
+        wav_path = audio_path
+        cleanup_wav = False
+        if preprocess:
+            wav_path = self._preprocess_audio(audio_path)
+            cleanup_wav = True
+        try:
+            # Configure Azure Speech
+            if self.speech_endpoint:
+                speech_config = speechsdk.SpeechConfig(
+                    subscription=self.speech_key,
+                    endpoint=self.speech_endpoint,
+                )
+            else:
+                speech_config = speechsdk.SpeechConfig(
+                    subscription=self.speech_key,
+                    region=self.speech_region,
+                )
+            # Configure audio input
+            audio_config = speechsdk.audio.AudioConfig(filename=wav_path)
+            # Create speech recognizer with language
+            recognizer = speechsdk.SpeechRecognizer(
+                speech_config=speech_config,
+                audio_config=audio_config,
+                language=language,
+            )
+            # Event-driven continuous recognition
+            paragraphs: list[str] = []
+            current: list[str] = []
+            done = asyncio.get_event_loop().create_future()
+            def recognizing_handler(evt):
+                """Handle intermediate recognition results."""
+                if evt.result.reason == speechsdk.ResultReason.RecognizingSpeech:
+                    logger.debug(f"Recognizing: {evt.result.text}")
+            def recognized_handler(evt):
+                """Handle final recognition results."""
+                if evt.result.reason == speechsdk.ResultReason.RecognizedSpeech:
+                    text = evt.result.text.strip()
+                    if text:
+                        current.append(text)
+                        logger.debug(f"Recognized: {text}")
+                elif evt.result.reason == speechsdk.ResultReason.NoMatch:
+                    logger.debug("No speech recognized")
+            def session_stopped(evt):
+                """Handle session stop."""
+                logger.info("Session stopped")
+                if current:
+                    paragraphs.append(" ".join(current))
+                if not done.done():
+                    done.set_result(True)
+            def canceled(evt):
+                """Handle cancellation."""
+                if evt.reason == speechsdk.CancellationReason.Error:
+                    error_msg = f"Transcription error: {evt.error_details}"
+                    logger.error(error_msg)
+                    if not done.done():
+                        done.set_exception(RuntimeError(error_msg))
+                else:
+                    logger.info("Transcription canceled")
+                    if not done.done():
+                        done.set_result(True)
+            # Connect event handlers
+            recognizer.recognizing.connect(recognizing_handler)
+            recognizer.recognized.connect(recognized_handler)
+            recognizer.session_stopped.connect(session_stopped)
+            recognizer.canceled.connect(canceled)
+            # Start continuous recognition
+            logger.info(f"Starting transcription for {audio_path} (language: {language})")
+            recognizer.start_continuous_recognition_async().get()
+            # Wait for completion (max 15 minutes timeout)
+            try:
+                await asyncio.wait_for(done, timeout=900)
+            except asyncio.TimeoutError:
+                raise RuntimeError("Transcription timeout (15 minutes exceeded)")
+            # Stop recognition
+            recognizer.stop_continuous_recognition_async().get()
+            # Format as markdown with bullet points
+            if not paragraphs:
+                logger.warning("No transcription results")
+                return ""
+            markdown = "\n\n".join(f"- {para}" for para in paragraphs)
+            logger.info(f"Transcription completed: {len(paragraphs)} paragraphs")
+            return markdown
+        finally:
+            # Clean up preprocessed WAV file
+            if cleanup_wav and os.path.exists(wav_path):
+                try:
+                    os.unlink(wav_path)
+                    logger.debug(f"Cleaned up temporary file: {wav_path}")
+                except Exception as e:
+                    logger.warning(f"Failed to clean up {wav_path}: {e}")
+    async def transcribe_audio_bytes(
+        self,
+        audio_bytes: bytes,
+        file_extension: str = ".mp3",
+        language: str = "id-ID",
+    ) -> str:
+        """Transcribe audio from bytes to markdown-formatted text.
+        Args:
+            audio_bytes: Audio file content as bytes
+            file_extension: File extension (for format detection)
+            language: BCP-47 language code (default: 'id-ID' for Indonesian)
+        Returns:
+            Markdown-formatted transcription text
+        Raises:
+            RuntimeError: If SDK not available, credentials missing, or transcription fails
+        """
+        # Write bytes to temporary file
+        with tempfile.NamedTemporaryFile(
+            suffix=file_extension,
+            delete=False,
+        ) as tf:
+            tf.write(audio_bytes)
+            temp_path = tf.name
+        try:
+            return await self.transcribe_audio_to_markdown(
+                temp_path,
+                language=language,
+                preprocess=True,
+            )
+        finally:
+            # Clean up temporary file
+            if os.path.exists(temp_path):
+                try:
+                    os.unlink(temp_path)
+                except Exception as e:
+                    logger.warning(f"Failed to clean up {temp_path}: {e}")

srx_lib_azure-0.1.8/PKG-INFO DELETED Viewed

@@ -1,70 +0,0 @@
-Metadata-Version: 2.4
-Name: srx-lib-azure
-Version: 0.1.8
-Summary: Azure helpers for SRX services: Blob, Email, Table
-Author-email: SRX <dev@srx.id>
-Requires-Python: >=3.12
-Requires-Dist: azure-communication-email>=1.0.0
-Requires-Dist: azure-data-tables>=12.7.0
-Requires-Dist: azure-storage-blob>=12.22.0
-Requires-Dist: loguru>=0.7.2
-Description-Content-Type: text/markdown
-# srx-lib-azure
-Lightweight wrappers over Azure SDKs used across SRX services.
-What it includes:
-- Blob: upload/download helpers, SAS URL generation
-- Email (Azure Communication Services): simple async sender
-- Table: simple CRUD helpers
-## Install
-PyPI (public):
-- `pip install srx-lib-azure`
-uv (pyproject):
-```
-[project]
-dependencies = ["srx-lib-azure>=0.1.0"]
-```
-## Usage
-Blob:
-```
-from srx_lib_azure.blob import AzureBlobService
-blob = AzureBlobService()
-url = await blob.upload_file(upload_file, "documents/report.pdf")
-```
-Email:
-```
-from srx_lib_azure.email import EmailService
-svc = EmailService()
-await svc.send_notification("user@example.com", "Subject", "Hello", html=False)
-```
-Table:
-```
-from srx_lib_azure.table import AzureTableService
-store = AzureTableService()
-store.ensure_table("events")
-store.upsert_entity("events", {"PartitionKey":"p","RowKey":"r","EventType":"x"})
-```
-## Environment Variables
-- Blob & Table: `AZURE_STORAGE_CONNECTION_STRING` (required)
-- Email (ACS): `ACS_CONNECTION_STRING`, `EMAIL_SENDER`
-- Optional: `AZURE_STORAGE_ACCOUNT_KEY`, `AZURE_BLOB_URL`, `AZURE_SAS_TOKEN`
-## Release
-Tag `vX.Y.Z` to publish to GitHub Packages via Actions.
-## License
-Proprietary © SRX

srx_lib_azure-0.1.8/README.md DELETED Viewed

@@ -1,58 +0,0 @@
-# srx-lib-azure
-Lightweight wrappers over Azure SDKs used across SRX services.
-What it includes:
-- Blob: upload/download helpers, SAS URL generation
-- Email (Azure Communication Services): simple async sender
-- Table: simple CRUD helpers
-## Install
-PyPI (public):
-- `pip install srx-lib-azure`
-uv (pyproject):
-```
-[project]
-dependencies = ["srx-lib-azure>=0.1.0"]
-```
-## Usage
-Blob:
-```
-from srx_lib_azure.blob import AzureBlobService
-blob = AzureBlobService()
-url = await blob.upload_file(upload_file, "documents/report.pdf")
-```
-Email:
-```
-from srx_lib_azure.email import EmailService
-svc = EmailService()
-await svc.send_notification("user@example.com", "Subject", "Hello", html=False)
-```
-Table:
-```
-from srx_lib_azure.table import AzureTableService
-store = AzureTableService()
-store.ensure_table("events")
-store.upsert_entity("events", {"PartitionKey":"p","RowKey":"r","EventType":"x"})
-```
-## Environment Variables
-- Blob & Table: `AZURE_STORAGE_CONNECTION_STRING` (required)
-- Email (ACS): `ACS_CONNECTION_STRING`, `EMAIL_SENDER`
-- Optional: `AZURE_STORAGE_ACCOUNT_KEY`, `AZURE_BLOB_URL`, `AZURE_SAS_TOKEN`
-## Release
-Tag `vX.Y.Z` to publish to GitHub Packages via Actions.
-## License
-Proprietary © SRX

srx_lib_azure-0.1.8/src/srx_lib_azure/__init__.py DELETED Viewed

@@ -1,5 +0,0 @@
-from .blob import AzureBlobService
-from .email import EmailService
-from .table import AzureTableService
-__all__ = ["AzureBlobService", "EmailService", "AzureTableService"]

{srx_lib_azure-0.1.8 → srx_lib_azure-0.3.0}/.github/workflows/publish.yml RENAMED Viewed

File without changes

{srx_lib_azure-0.1.8 → srx_lib_azure-0.3.0}/.gitignore RENAMED Viewed

File without changes

{srx_lib_azure-0.1.8 → srx_lib_azure-0.3.0}/src/srx_lib_azure/blob.py RENAMED Viewed

File without changes

{srx_lib_azure-0.1.8 → srx_lib_azure-0.3.0}/src/srx_lib_azure/email.py RENAMED Viewed

File without changes

{srx_lib_azure-0.1.8 → srx_lib_azure-0.3.0}/src/srx_lib_azure/table.py RENAMED Viewed

File without changes

srx-lib-azure 0.1.8__tar.gz → 0.3.0__tar.gz

srx-lib-azure 0.1.8tar.gz → 0.3.0tar.gz