PyPI - synapse-sdk - Versions diffs - 2025.9.5__py3-none-any.whl → 2025.10.6__py3-none-any.whl - Mend

synapse-sdk 2025.9.5py3-none-any.whl → 2025.10.6py3-none-any.whl

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Potentially problematic release.

This version of synapse-sdk might be problematic. Click here for more details.

Files changed (78) hide show

synapse_sdk/clients/base.py CHANGED Viewed

@@ -250,19 +250,139 @@ class BaseClient:
         else:
             return response
-    def _list(self, path, url_conversion=None, list_all=False, **kwargs):
-        response = self._get(path, **kwargs)
+    def _list(self, path, url_conversion=None, list_all=False, params=None, **kwargs):
+        """List resources from a paginated API endpoint.
+        Args:
+            path (str): URL path to request.
+            url_conversion (dict, optional): Configuration for URL to path conversion.
+                Used to convert file URLs to local paths in the response.
+                Example: {'files_fields': ['files'], 'is_list': True}
+                This will convert file URLs in the 'files' field of each result.
+            list_all (bool): If True, returns a generator yielding all results across all pages.
+                Default is False, which returns only the first page.
+            params (dict, optional): Query parameters to pass to the request.
+                Example: {'status': 'active', 'project': 123}
+            **kwargs: Additional keyword arguments to pass to the request.
+        Returns:
+            If list_all is False: dict response from the API containing:
+                - 'results': list of items on the current page
+                - 'count': total number of items
+                - 'next': URL to the next page (or None)
+                - 'previous': URL to the previous page (or None)
+            If list_all is True: tuple of (generator, count) where:
+                - generator: yields individual items from all pages
+                - count: total number of items across all pages
+        Examples:
+            Get first page only:
+            >>> response = client._list('api/tasks/')
+            >>> tasks = response['results']  # List of tasks on first page
+            >>> total_count = response['count']  # Total number of tasks
+            Get all results across all pages:
+            >>> generator, count = client._list('api/tasks/', list_all=True)
+            >>> all_tasks = list(generator)  # Fetches all pages
+            With filters and url_conversion:
+            >>> url_conversion = {'files_fields': ['files'], 'is_list': True}
+            >>> params = {'status': 'active'}
+            >>> generator, count = client._list(
+            ...     'api/data_units/',
+            ...     url_conversion=url_conversion,
+            ...     list_all=True,
+            ...     params=params
+            ... )
+            >>> active_units = list(generator)  # All active units with file URLs converted
+        """
+        if params is None:
+            params = {}
         if list_all:
-            return self._list_all(path, url_conversion, **kwargs), response['count']
+            response = self._get(path, params=params, **kwargs)
+            return self._list_all(path, url_conversion, params=params, **kwargs), response.get('count')
         else:
+            response = self._get(path, params=params, **kwargs)
             return response
-    def _list_all(self, path, url_conversion=None, params={}, **kwargs):
-        params['page_size'] = self.page_size
-        response = self._get(path, url_conversion, params=params, **kwargs)
-        yield from response['results']
-        if response['next']:
-            yield from self._list_all(response['next'], url_conversion, **kwargs)
+    def _list_all(self, path, url_conversion=None, params=None, **kwargs):
+        """Generator that yields all results from a paginated API endpoint.
+        This method handles pagination automatically by following the 'next' URLs
+        returned by the API until all pages have been fetched. It uses an iterative
+        approach (while loop) instead of recursion to avoid stack overflow with
+        deep pagination.
+        Args:
+            path (str): Initial URL path to request.
+            url_conversion (dict, optional): Configuration for URL to path conversion.
+                Applied to all pages. Common structure:
+                - 'files_fields': List of field names containing file URLs
+                - 'is_list': Whether the response is a list (True for paginated results)
+                Example: {'files_fields': ['files', 'images'], 'is_list': True}
+            params (dict, optional): Query parameters for the first request only.
+                Subsequent requests use the 'next' URL which already includes
+                all necessary parameters. If 'page_size' is not specified,
+                it defaults to self.page_size (100).
+                Example: {'status': 'active', 'page_size': 50}
+            **kwargs: Additional keyword arguments to pass to requests.
+                Example: timeout, headers, etc.
+        Yields:
+            dict: Individual result items from all pages. Each item is yielded
+                as soon as it's fetched, allowing for memory-efficient processing
+                of large datasets.
+        Examples:
+            Basic usage - fetch all tasks:
+            >>> for task in client._list_all('api/tasks/'):
+            ...     process_task(task)
+            With filters:
+            >>> params = {'status': 'pending', 'priority': 'high'}
+            >>> for task in client._list_all('api/tasks/', params=params):
+            ...     print(task['id'])
+            With url_conversion for file fields:
+            >>> url_conversion = {'files_fields': ['files'], 'is_list': True}
+            >>> for data_unit in client._list_all('api/data_units/', url_conversion):
+            ...     # File URLs in 'files' field are converted to local paths
+            ...     print(data_unit['files'])
+            Collecting results into a list:
+            >>> all_tasks = list(client._list_all('api/tasks/'))
+            >>> print(f"Total tasks: {len(all_tasks)}")
+        Note:
+            - This is a generator function, so results are fetched lazily as you iterate
+            - The first page is fetched with the provided params
+            - Subsequent pages use the 'next' URL from the API response
+            - No duplicate page_size parameters are added to subsequent requests
+            - Memory efficient: processes one item at a time rather than loading all at once
+        """
+        if params is None:
+            params = {}
+        # Set page_size only if not already specified by user
+        request_params = params.copy()
+        if 'page_size' not in request_params:
+            request_params['page_size'] = self.page_size
+        next_url = path
+        is_first_request = True
+        while next_url:
+            # First request uses params, subsequent requests use next URL directly
+            if is_first_request:
+                response = self._get(next_url, url_conversion, params=request_params, **kwargs)
+                is_first_request = False
+            else:
+                # next URL already contains all necessary query parameters
+                response = self._get(next_url, url_conversion, **kwargs)
+            yield from response['results']
+            next_url = response.get('next')
     def exists(self, api, *args, **kwargs):
         return getattr(self, api)(*args, **kwargs)['count'] > 0

synapse_sdk/devtools/docs/docs/api/clients/base.md CHANGED Viewed

@@ -6,30 +6,252 @@ sidebar_position: 3
 # BaseClient
-Base class for all Synapse SDK clients.
+Base class for all Synapse SDK clients providing core HTTP operations and pagination.
 ## Overview
-The `BaseClient` provides common functionality for HTTP operations, error handling, and request management used by all other clients.
+The `BaseClient` provides common functionality for HTTP operations, error handling, request management, and pagination used by all other clients. It implements efficient pagination handling with automatic file URL conversion capabilities.
 ## Features
 - HTTP request handling with retry logic
 - Automatic timeout management
-- File upload/download capabilities
+- Efficient pagination with generators
+- File URL to local path conversion
 - Pydantic model validation
 - Connection pooling
-## Usage
+## Core HTTP Methods
+The BaseClient provides low-level HTTP methods that are used internally by all client mixins:
+- `_get()` - GET requests with optional response model validation
+- `_post()` - POST requests with request/response validation
+- `_put()` - PUT requests with model validation
+- `_patch()` - PATCH requests with model validation
+- `_delete()` - DELETE requests with model validation
+These methods are typically not called directly. Instead, use the higher-level methods provided by client mixins.
+## Pagination Methods
+### `_list(path, url_conversion=None, list_all=False, params=None, **kwargs)`
+List resources from a paginated API endpoint with optional automatic pagination and file URL conversion.
+**Parameters:**
+- `path` (str): URL path to request
+- `url_conversion` (dict, optional): Configuration for converting file URLs to local paths
+  - Structure: `{'files_fields': ['field1', 'field2'], 'is_list': True}`
+  - Automatically downloads files and replaces URLs with local paths
+- `list_all` (bool): If True, returns all results across all pages using a generator
+- `params` (dict, optional): Query parameters (filters, sorting, etc.)
+- `**kwargs`: Additional request arguments
+**Returns:**
+- If `list_all=False`: Dict with `results`, `count`, `next`, `previous`
+- If `list_all=True`: Tuple of `(generator, total_count)`
+**Examples:**
+```python
+# Get first page only
+response = client._list('api/tasks/')
+tasks = response['results']  # First page of tasks
+total = response['count']     # Total number of tasks
+# Get all results using generator (memory efficient)
+generator, total_count = client._list('api/tasks/', list_all=True)
+all_tasks = list(generator)  # Fetches all pages automatically
+# With filters
+params = {'status': 'pending', 'priority': 'high'}
+response = client._list('api/tasks/', params=params)
+# With url_conversion for file fields
+url_conversion = {'files_fields': ['files'], 'is_list': True}
+generator, count = client._list(
+    'api/data_units/',
+    url_conversion=url_conversion,
+    list_all=True,
+    params={'status': 'active'}
+)
+# File URLs in 'files' field are automatically downloaded and converted to local paths
+for unit in generator:
+    print(unit['files'])  # Local file paths, not URLs
+```
+### `_list_all(path, url_conversion=None, params=None, **kwargs)`
+Generator that yields all results from a paginated API endpoint.
+This method is called internally by `_list()` when `list_all=True`. It handles pagination automatically by following `next` URLs and uses an iterative approach (while loop) instead of recursion to avoid stack overflow with deep pagination.
+**Key Improvements (SYN-5757):**
+1. **No duplicate page_size**: The `page_size` parameter is only added to the first request. Subsequent requests use the `next` URL directly, which already contains all necessary parameters.
+2. **Proper params handling**: User-specified query parameters are correctly passed to the first request and preserved through pagination via the `next` URL.
+3. **url_conversion on all pages**: URL conversion is applied to every page, not just the first one.
+4. **Iterative instead of recursive**: Uses a while loop instead of recursion for better memory efficiency and to prevent stack overflow on large datasets.
+**Parameters:**
+- `path` (str): Initial URL path
+- `url_conversion` (dict, optional): Applied to all pages
+- `params` (dict, optional): Query parameters for first request only
+- `**kwargs`: Additional request arguments
+**Yields:**
+Individual result items from all pages, fetched lazily.
+**Examples:**
+```python
+# Basic: iterate through all tasks
+for task in client._list_all('api/tasks/'):
+    process_task(task)
+# With filters
+params = {'status': 'pending'}
+for task in client._list_all('api/tasks/', params=params):
+    print(task['id'])
+# With url_conversion for nested file fields
+url_conversion = {'files_fields': ['data.files', 'metadata.attachments'], 'is_list': True}
+for item in client._list_all('api/items/', url_conversion=url_conversion):
+    print(item['data']['files'])  # Local paths
+# Collect all results (memory intensive for large datasets)
+all_results = list(client._list_all('api/tasks/'))
+```
+## URL Conversion for File Downloads
+The `url_conversion` parameter enables automatic downloading of files referenced by URLs in API responses. This is particularly useful when working with data units, tasks, or any resources that include file references.
+### URL Conversion Structure
+```python
+url_conversion = {
+    'files_fields': ['files', 'images', 'data.attachments'],  # Field paths
+    'is_list': True  # Whether processing a list of items
+}
+```
+- `files_fields`: List of field paths (supports dot notation for nested fields)
+- `is_list`: Set to `True` for paginated list responses
+### How It Works
+1. API returns responses with file URLs
+2. `url_conversion` identifies fields containing URLs
+3. Files are downloaded automatically to a temporary directory
+4. URLs are replaced with local file paths
+5. Your code receives responses with local paths instead of URLs
+### Examples
+```python
+# Simple file field
+url_conversion = {'files_fields': ['image_url'], 'is_list': True}
+generator, count = client._list(
+    'api/photos/',
+    url_conversion=url_conversion,
+    list_all=True
+)
+for photo in generator:
+    # photo['image_url'] is now a local Path object, not a URL
+    with open(photo['image_url'], 'rb') as f:
+        process_image(f)
+# Multiple file fields
+url_conversion = {
+    'files_fields': ['thumbnail', 'full_image', 'raw_data'],
+    'is_list': True
+}
+# Nested fields using dot notation
+url_conversion = {
+    'files_fields': ['data.files', 'metadata.preview', 'annotations.image'],
+    'is_list': True
+}
+# With async download for better performance
+from synapse_sdk.utils.file import files_url_to_path_from_objs
+results = client._list('api/data_units/')['results']
+files_url_to_path_from_objs(
+    results,
+    files_fields=['files'],
+    is_list=True,
+    is_async=True  # Download all files concurrently
+)
+```
+## Performance Considerations
+### Memory Efficiency
+When working with large datasets, use generators instead of loading all results into memory:
+```python
+# ❌ Memory intensive - loads all results
+all_tasks = list(client._list('api/tasks/', list_all=True)[0])
+# ✅ Memory efficient - processes one at a time
+generator, _ = client._list('api/tasks/', list_all=True)
+for task in generator:
+    process_task(task)
+    # Task is processed and can be garbage collected
+```
+### Pagination Best Practices
+1. **Use list_all=True** for datasets larger than one page
+2. **Set appropriate page_size** in params if default (100) isn't optimal
+3. **Use url_conversion** only when you need to process files
+4. **Consider async downloads** for multiple files per item
+```python
+# Optimal pagination for large dataset
+params = {'page_size': 50}  # Smaller pages for faster first response
+generator, total = client._list(
+    'api/large_dataset/',
+    list_all=True,
+    params=params
+)
+# Process with progress tracking
+from tqdm import tqdm
+for item in tqdm(generator, total=total):
+    process_item(item)
+```
+## Usage in Client Mixins
+The BaseClient pagination methods are used internally by all client mixins:
 ```python
-from synapse_sdk.clients.base import BaseClient
+# DataCollectionClientMixin
+def list_data_units(self, params=None, url_conversion=None, list_all=False):
+    return self._list('data_units/', params=params,
+                     url_conversion=url_conversion, list_all=list_all)
-# BaseClient is typically not used directly
-# Use BackendClient or AgentClient instead
+# AnnotationClientMixin
+def list_tasks(self, params=None, url_conversion=None, list_all=False):
+    return self._list('sdk/tasks/', params=params,
+                     url_conversion=url_conversion, list_all=list_all)
 ```
 ## See Also
 - [BackendClient](./backend.md) - Main client implementation
-- [AgentClient](./agent.md) - Agent-specific operations
+- [AgentClient](./agent.md) - Agent-specific operations
+- [DataCollectionClientMixin](./data-collection-mixin.md) - Data and file operations
+- [AnnotationClientMixin](./annotation-mixin.md) - Task and annotation management

synapse_sdk/devtools/docs/docs/api/plugins/models.md CHANGED Viewed

@@ -50,10 +50,65 @@ Execution context for plugin actions.
 def start(self):
     # Log messages
     self.run.log("Processing started")
     # Update progress
     self.run.set_progress(0.5)
     # Set metrics
     self.run.set_metrics({"processed": 100})
-```
+```
+### Development Logging
+The `Run` class includes a specialized logging system for plugin developers with the `log_dev_event()` method and `DevLog` model.
+#### DevLog Model
+Structured model for development event logging:
+```python
+from synapse_sdk.shared.enums import Context
+class DevLog(BaseModel):
+    event_type: str          # Event category (automatically generated as '{action_name}_dev_log')
+    message: str             # Descriptive message
+    data: dict | None        # Optional additional data
+    level: Context           # Event severity level
+    created: str             # ISO timestamp
+```
+#### log_dev_event Method
+Log custom development events for debugging and monitoring:
+```python
+def start(self):
+    # Basic event logging (event_type automatically set to '{action_name}_dev_log')
+    self.run.log_dev_event('Data validation completed', {'records_count': 100})
+    # Performance tracking
+    self.run.log_dev_event('Processing time recorded', {'duration_ms': 1500})
+    # Debug with warning level
+    self.run.log_dev_event('Variable state checkpoint',
+                          {'variable_x': 42}, level=Context.WARNING)
+    # Simple event without data
+    self.run.log_dev_event('Plugin initialization complete')
+```
+**Parameters:**
+- `message` (str): Human-readable description
+- `data` (dict, optional): Additional context data
+- `level` (Context, optional): Event severity (default: Context.INFO)
+**Note:** The `event_type` is automatically generated as `{action_name}_dev_log` and cannot be modified by plugin developers.
+**Use Cases:**
+- **Debugging**: Track variable states and execution flow
+- **Performance**: Record processing times and resource usage
+- **Validation**: Log data validation results
+- **Error Tracking**: Capture detailed error information
+- **Progress Monitoring**: Record intermediate states in long-running tasks

synapse-sdk 2025.9.5__py3-none-any.whl → 2025.10.6__py3-none-any.whl

Potentially problematic release.

synapse-sdk 2025.9.5py3-none-any.whl → 2025.10.6py3-none-any.whl