PyPI - avtomatika-worker - Versions diffs - 1.0b3__tar.gz → 1.0b5__tar.gz - Mend

avtomatika-worker 1.0b3tar.gz → 1.0b5tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (38) hide show

{avtomatika_worker-1.0b3 → avtomatika_worker-1.0b5}/LICENSE RENAMED Viewed

@@ -1,6 +1,6 @@
 MIT License
-Copyright (c) 2025 Dmitrii Gagarin
+Copyright (c) 2025-2026 Dmitrii Gagarin aka madgagarin
 Permission is hereby granted, free of charge, to any person obtaining a copy
 of this software and associated documentation files (the "Software"), to deal

{avtomatika_worker-1.0b3 → avtomatika_worker-1.0b5}/PKG-INFO RENAMED Viewed

@@ -1,16 +1,21 @@
 Metadata-Version: 2.4
 Name: avtomatika-worker
-Version: 1.0b3
+Version: 1.0b5
 Summary: Worker SDK for the Avtomatika orchestrator.
+Author-email: Dmitrii Gagarin <madgagarin@gmail.com>
 Project-URL: Homepage, https://github.com/avtomatika-ai/avtomatika-worker
 Project-URL: Bug Tracker, https://github.com/avtomatika-ai/avtomatika-worker/issues
+Keywords: worker,sdk,orchestrator,distributed,task-queue,rxon,hln
 Classifier: Development Status :: 4 - Beta
+Classifier: Intended Audience :: Developers
 Classifier: Programming Language :: Python :: 3
 Classifier: License :: OSI Approved :: MIT License
 Classifier: Operating System :: OS Independent
+Classifier: Typing :: Typed
 Requires-Python: >=3.11
 Description-Content-Type: text/markdown
 License-File: LICENSE
+Requires-Dist: rxon==1.0b2
 Requires-Dist: aiohttp~=3.13.2
 Requires-Dist: python-json-logger~=4.0.0
 Requires-Dist: obstore>=0.1
@@ -28,7 +33,11 @@ Dynamic: license-file
 # Avtomatika Worker SDK
-This is the official SDK for creating workers compatible with the **[Avtomatika Orchestrator](https://github.com/avtomatika-ai/avtomatika)**. It implements the **[RCA Protocol](https://github.com/avtomatika-ai/rca)**, handling all communication complexity (polling, heartbeats, S3 offloading) so you can focus on writing your business logic.
+[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
+[![Python 3.11+](https://img.shields.io/badge/python-3.11+-blue.svg)](https://www.python.org/downloads/release/python-3110/)
+[![Code Style: Ruff](https://img.shields.io/badge/code%20style-ruff-000000.svg)](https://github.com/astral-sh/ruff)
+This is the official SDK for creating workers compatible with the **[Avtomatika Orchestrator](https://github.com/avtomatika-ai/avtomatika)**. It is built upon the **[Avtomatika Protocol](https://github.com/avtomatika-ai/rxon)** and implements the **[HLN Protocol](https://github.com/avtomatika-ai/hln)**, handling all communication complexity (polling, heartbeats, S3 offloading) so you can focus on writing your business logic.
 ## Installation
@@ -286,13 +295,26 @@ async def image_resizer(params: ResizeParams, **kwargs):
 ### 1. Task Handlers
-Each handler is an asynchronous function that accepts two arguments:
+Each handler is a function (either `async def` or `def`) that accepts two arguments:
 -   `params` (`dict`, `dataclass`, or `pydantic.BaseModel`): The parameters for the task, automatically validated and instantiated based on your type hint.
 -   `**kwargs`: Additional metadata about the task, including:
     -   `task_id` (`str`): The unique ID of the task itself.
     -   `job_id` (`str`): The ID of the parent `Job` to which the task belongs.
     -   `priority` (`int`): The execution priority of the task.
+    -   `send_progress` (`callable`): An async function `await send_progress(progress_float, message_string)` to report task execution progress (0.0 to 1.0) to the orchestrator.
+**Synchronous Handlers:**
+If you define your handler as a standard synchronous function (`def handler(...)`), the SDK will automatically execute it in a separate thread using `asyncio.to_thread`. This ensures that CPU-intensive operations (like model inference) do not block the worker's main event loop, allowing heartbeats and other background tasks to continue running smoothly.
+```python
+@worker.task("cpu_heavy_task")
+def heavy_computation(params: dict, **kwargs):
+    # This will run in a thread, not blocking the loop
+    import time
+    time.sleep(10)
+    return {"status": "success"}
+```
 ### 2. Concurrency Limiting
@@ -383,7 +405,7 @@ return {
 #### Error Handling
-To control the orchestrator's fault tolerance mechanism, you can return standardized error types.
+To control the orchestrator's fault tolerance mechanism, you can return standardized error types. All error constants can be imported from `avtomatika_worker.typing`.
 -   **Transient Error (`TRANSIENT_ERROR`)**: For issues that might be resolved on a retry (e.g., a network failure).
     ```python
@@ -396,17 +418,10 @@ To control the orchestrator's fault tolerance mechanism, you can return standard
         }
     }
     ```
--   **Permanent Error (`PERMANENT_ERROR`)**: For unresolvable problems (e.g., an invalid file format).
-    ```python
-    from avtomatika_worker.typing import PERMANENT_ERROR
-    return {
-        "status": "failure",
-        "error": {
-            "code": PERMANENT_ERROR,
-            "message": "Corrupted input file"
-        }
-    }
-    ```
+-   **Permanent Error (`PERMANENT_ERROR`)**: For unresolvable problems (e.g., an invalid file format). Causes immediate quarantine.
+-   **Security Error (`SECURITY_ERROR`)**: For security violations. Causes immediate quarantine.
+-   **Dependency Error (`DEPENDENCY_ERROR`)**: For missing models or tools. Causes immediate quarantine.
+-   **Resource Exhausted (`RESOURCE_EXHAUSTED_ERROR`)**: When resources are temporarily unavailable. Treated as transient (retried).
 ### 4. Failover and Load Balancing
@@ -521,6 +536,48 @@ This only requires configuring environment variables for S3 access (see Full Con
 ### 7. WebSocket Support
+For real-time communication (e.g., immediate task cancellation), the worker supports WebSocket connections. This is enabled by setting `WORKER_ENABLE_WEBSOCKETS=true`. When connected, the orchestrator can push commands like `cancel_task` directly to the worker.
+### 8. Middleware
+The worker supports a middleware system, allowing you to wrap task executions with custom logic. This is particularly useful for resource management (e.g., acquiring GPU locks), logging, error handling, or **Dependency Injection**.
+Middleware functions wrap the execution of the task handler (and any subsequent middlewares). They receive a context dictionary and the next handler in the chain.
+The `context` dictionary contains:
+- `task_id`, `job_id`, `task_name`: Metadata.
+- `params`: The validated parameters object.
+- `handler_kwargs`: A dictionary of arguments that will be passed to the handler. **Middleware can modify this dictionary to inject dependencies.**
+**Example: GPU Resource Manager & Dependency Injection**
+```python
+async def gpu_lock_middleware(context: dict, next_handler: callable):
+    # Pre-processing: Acquire resource
+    print(f"Acquiring GPU for task {context['task_id']}...")
+    model_path = await resource_manager.allocate()
+    # Inject the model path into the handler's arguments
+    context["handler_kwargs"]["model_path"] = model_path
+    try:
+        # Execute the next handler in the chain
+        result = await next_handler()
+        return result
+    finally:
+        # Post-processing: Release resource
+        print(f"Releasing GPU for task {context['task_id']}...")
+        resource_manager.release()
+# Register the middleware
+worker.add_middleware(gpu_lock_middleware)
+# Handler now receives 'model_path' automatically
+@worker.task("generate")
+def generate(params, model_path, **kwargs):
+    print(f"Using model at: {model_path}")
+```
 ## Advanced Features
 ### Reporting Skill & Model Dependencies
@@ -577,8 +634,11 @@ The worker is fully configured via environment variables.
 | `WORKER_TYPE`                 | A string identifying the type of the worker.                                                            | `generic-cpu-worker`                   |
 | `WORKER_PORT`                 | The port for the worker's health check server.                                                          | `8083`                                 |
 | `WORKER_TOKEN`                | A common authentication token used to connect to orchestrators.                                         | `your-secret-worker-token`             |
-| `WORKER_INDIVIDUAL_TOKEN`     | An individual token for this worker, which overrides `WORKER_TOKEN` if set.                               | -                                      |
-| `ORCHESTRATOR_URL`            | The URL of a single orchestrator (used if `ORCHESTRATORS_CONFIG` is not set).                             | `http://localhost:8080`                |
+-   **`WORKER_INDIVIDUAL_TOKEN`**: An individual token for this worker, which overrides `WORKER_TOKEN` if set.
+-   **`TLS_CA_PATH`**: Path to the CA certificate to verify the orchestrator.
+-   **`TLS_CERT_PATH`**: Path to the client certificate for mTLS.
+-   **`TLS_KEY_PATH`**: Path to the client private key for mTLS.
+-   **`ORCHESTRATOR_URL`**: The address of the Avtomatika orchestrator.
 | `ORCHESTRATORS_CONFIG`        | A JSON string with a list of orchestrators for multi-orchestrator modes.                                | `[]`                                   |
 | `MULTI_ORCHESTRATOR_MODE`     | The mode for handling multiple orchestrators. Possible values: `FAILOVER`, `ROUND_ROBIN`.                  | `FAILOVER`                             |
 | `MAX_CONCURRENT_TASKS`        | The maximum number of tasks the worker can execute simultaneously.                                      | `10`                                   |
@@ -605,8 +665,9 @@ The worker is fully configured via environment variables.
 ## Development
-To install the necessary dependencies for running tests, use the following command:
+To install the necessary dependencies for running tests (assuming you are in the package root):
-```bash
-pip install .[test]
-```
+1.  Install the worker in editable mode with test dependencies:
+    ```bash
+    pip install -e .[test]
+    ```

avtomatika_worker-1.0b3/src/avtomatika_worker.egg-info/PKG-INFO → avtomatika_worker-1.0b5/README.md RENAMED Viewed

@@ -1,34 +1,10 @@
-Metadata-Version: 2.4
-Name: avtomatika-worker
-Version: 1.0b3
-Summary: Worker SDK for the Avtomatika orchestrator.
-Project-URL: Homepage, https://github.com/avtomatika-ai/avtomatika-worker
-Project-URL: Bug Tracker, https://github.com/avtomatika-ai/avtomatika-worker/issues
-Classifier: Development Status :: 4 - Beta
-Classifier: Programming Language :: Python :: 3
-Classifier: License :: OSI Approved :: MIT License
-Classifier: Operating System :: OS Independent
-Requires-Python: >=3.11
-Description-Content-Type: text/markdown
-License-File: LICENSE
-Requires-Dist: aiohttp~=3.13.2
-Requires-Dist: python-json-logger~=4.0.0
-Requires-Dist: obstore>=0.1
-Requires-Dist: aiofiles~=25.1.0
-Provides-Extra: test
-Requires-Dist: pytest; extra == "test"
-Requires-Dist: pytest-asyncio; extra == "test"
-Requires-Dist: aioresponses; extra == "test"
-Requires-Dist: pytest-mock; extra == "test"
-Requires-Dist: pydantic; extra == "test"
-Requires-Dist: types-aiofiles; extra == "test"
-Provides-Extra: pydantic
-Requires-Dist: pydantic; extra == "pydantic"
-Dynamic: license-file
 # Avtomatika Worker SDK
-This is the official SDK for creating workers compatible with the **[Avtomatika Orchestrator](https://github.com/avtomatika-ai/avtomatika)**. It implements the **[RCA Protocol](https://github.com/avtomatika-ai/rca)**, handling all communication complexity (polling, heartbeats, S3 offloading) so you can focus on writing your business logic.
+[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
+[![Python 3.11+](https://img.shields.io/badge/python-3.11+-blue.svg)](https://www.python.org/downloads/release/python-3110/)
+[![Code Style: Ruff](https://img.shields.io/badge/code%20style-ruff-000000.svg)](https://github.com/astral-sh/ruff)
+This is the official SDK for creating workers compatible with the **[Avtomatika Orchestrator](https://github.com/avtomatika-ai/avtomatika)**. It is built upon the **[Avtomatika Protocol](https://github.com/avtomatika-ai/rxon)** and implements the **[HLN Protocol](https://github.com/avtomatika-ai/hln)**, handling all communication complexity (polling, heartbeats, S3 offloading) so you can focus on writing your business logic.
 ## Installation
@@ -286,13 +262,26 @@ async def image_resizer(params: ResizeParams, **kwargs):
 ### 1. Task Handlers
-Each handler is an asynchronous function that accepts two arguments:
+Each handler is a function (either `async def` or `def`) that accepts two arguments:
 -   `params` (`dict`, `dataclass`, or `pydantic.BaseModel`): The parameters for the task, automatically validated and instantiated based on your type hint.
 -   `**kwargs`: Additional metadata about the task, including:
     -   `task_id` (`str`): The unique ID of the task itself.
     -   `job_id` (`str`): The ID of the parent `Job` to which the task belongs.
     -   `priority` (`int`): The execution priority of the task.
+    -   `send_progress` (`callable`): An async function `await send_progress(progress_float, message_string)` to report task execution progress (0.0 to 1.0) to the orchestrator.
+**Synchronous Handlers:**
+If you define your handler as a standard synchronous function (`def handler(...)`), the SDK will automatically execute it in a separate thread using `asyncio.to_thread`. This ensures that CPU-intensive operations (like model inference) do not block the worker's main event loop, allowing heartbeats and other background tasks to continue running smoothly.
+```python
+@worker.task("cpu_heavy_task")
+def heavy_computation(params: dict, **kwargs):
+    # This will run in a thread, not blocking the loop
+    import time
+    time.sleep(10)
+    return {"status": "success"}
+```
 ### 2. Concurrency Limiting
@@ -383,7 +372,7 @@ return {
 #### Error Handling
-To control the orchestrator's fault tolerance mechanism, you can return standardized error types.
+To control the orchestrator's fault tolerance mechanism, you can return standardized error types. All error constants can be imported from `avtomatika_worker.typing`.
 -   **Transient Error (`TRANSIENT_ERROR`)**: For issues that might be resolved on a retry (e.g., a network failure).
     ```python
@@ -396,17 +385,10 @@ To control the orchestrator's fault tolerance mechanism, you can return standard
         }
     }
     ```
--   **Permanent Error (`PERMANENT_ERROR`)**: For unresolvable problems (e.g., an invalid file format).
-    ```python
-    from avtomatika_worker.typing import PERMANENT_ERROR
-    return {
-        "status": "failure",
-        "error": {
-            "code": PERMANENT_ERROR,
-            "message": "Corrupted input file"
-        }
-    }
-    ```
+-   **Permanent Error (`PERMANENT_ERROR`)**: For unresolvable problems (e.g., an invalid file format). Causes immediate quarantine.
+-   **Security Error (`SECURITY_ERROR`)**: For security violations. Causes immediate quarantine.
+-   **Dependency Error (`DEPENDENCY_ERROR`)**: For missing models or tools. Causes immediate quarantine.
+-   **Resource Exhausted (`RESOURCE_EXHAUSTED_ERROR`)**: When resources are temporarily unavailable. Treated as transient (retried).
 ### 4. Failover and Load Balancing
@@ -521,6 +503,48 @@ This only requires configuring environment variables for S3 access (see Full Con
 ### 7. WebSocket Support
+For real-time communication (e.g., immediate task cancellation), the worker supports WebSocket connections. This is enabled by setting `WORKER_ENABLE_WEBSOCKETS=true`. When connected, the orchestrator can push commands like `cancel_task` directly to the worker.
+### 8. Middleware
+The worker supports a middleware system, allowing you to wrap task executions with custom logic. This is particularly useful for resource management (e.g., acquiring GPU locks), logging, error handling, or **Dependency Injection**.
+Middleware functions wrap the execution of the task handler (and any subsequent middlewares). They receive a context dictionary and the next handler in the chain.
+The `context` dictionary contains:
+- `task_id`, `job_id`, `task_name`: Metadata.
+- `params`: The validated parameters object.
+- `handler_kwargs`: A dictionary of arguments that will be passed to the handler. **Middleware can modify this dictionary to inject dependencies.**
+**Example: GPU Resource Manager & Dependency Injection**
+```python
+async def gpu_lock_middleware(context: dict, next_handler: callable):
+    # Pre-processing: Acquire resource
+    print(f"Acquiring GPU for task {context['task_id']}...")
+    model_path = await resource_manager.allocate()
+    # Inject the model path into the handler's arguments
+    context["handler_kwargs"]["model_path"] = model_path
+    try:
+        # Execute the next handler in the chain
+        result = await next_handler()
+        return result
+    finally:
+        # Post-processing: Release resource
+        print(f"Releasing GPU for task {context['task_id']}...")
+        resource_manager.release()
+# Register the middleware
+worker.add_middleware(gpu_lock_middleware)
+# Handler now receives 'model_path' automatically
+@worker.task("generate")
+def generate(params, model_path, **kwargs):
+    print(f"Using model at: {model_path}")
+```
 ## Advanced Features
 ### Reporting Skill & Model Dependencies
@@ -577,8 +601,11 @@ The worker is fully configured via environment variables.
 | `WORKER_TYPE`                 | A string identifying the type of the worker.                                                            | `generic-cpu-worker`                   |
 | `WORKER_PORT`                 | The port for the worker's health check server.                                                          | `8083`                                 |
 | `WORKER_TOKEN`                | A common authentication token used to connect to orchestrators.                                         | `your-secret-worker-token`             |
-| `WORKER_INDIVIDUAL_TOKEN`     | An individual token for this worker, which overrides `WORKER_TOKEN` if set.                               | -                                      |
-| `ORCHESTRATOR_URL`            | The URL of a single orchestrator (used if `ORCHESTRATORS_CONFIG` is not set).                             | `http://localhost:8080`                |
+-   **`WORKER_INDIVIDUAL_TOKEN`**: An individual token for this worker, which overrides `WORKER_TOKEN` if set.
+-   **`TLS_CA_PATH`**: Path to the CA certificate to verify the orchestrator.
+-   **`TLS_CERT_PATH`**: Path to the client certificate for mTLS.
+-   **`TLS_KEY_PATH`**: Path to the client private key for mTLS.
+-   **`ORCHESTRATOR_URL`**: The address of the Avtomatika orchestrator.
 | `ORCHESTRATORS_CONFIG`        | A JSON string with a list of orchestrators for multi-orchestrator modes.                                | `[]`                                   |
 | `MULTI_ORCHESTRATOR_MODE`     | The mode for handling multiple orchestrators. Possible values: `FAILOVER`, `ROUND_ROBIN`.                  | `FAILOVER`                             |
 | `MAX_CONCURRENT_TASKS`        | The maximum number of tasks the worker can execute simultaneously.                                      | `10`                                   |
@@ -605,8 +632,9 @@ The worker is fully configured via environment variables.
 ## Development
-To install the necessary dependencies for running tests, use the following command:
+To install the necessary dependencies for running tests (assuming you are in the package root):
-```bash
-pip install .[test]
-```
+1.  Install the worker in editable mode with test dependencies:
+    ```bash
+    pip install -e .[test]
+    ```

{avtomatika_worker-1.0b3 → avtomatika_worker-1.0b5}/pyproject.toml RENAMED Viewed

@@ -4,17 +4,24 @@ build-backend = "setuptools.build_meta"
 [project]
 name = "avtomatika-worker"
-version = "1.0.b3"
+version = "1.0b5"
 description = "Worker SDK for the Avtomatika orchestrator."
 readme = "README.md"
 requires-python = ">=3.11"
+authors = [
+    {name = "Dmitrii Gagarin", email = "madgagarin@gmail.com"},
+]
+keywords = ["worker", "sdk", "orchestrator", "distributed", "task-queue", "rxon", "hln"]
 classifiers = [
     "Development Status :: 4 - Beta",
+    "Intended Audience :: Developers",
     "Programming Language :: Python :: 3",
     "License :: OSI Approved :: MIT License",
     "Operating System :: OS Independent",
+    "Typing :: Typed",
 ]
 dependencies = [
+    "rxon==1.0b2",
     "aiohttp~=3.13.2",
     "python-json-logger~=4.0.0",
     "obstore>=0.1",
@@ -39,6 +46,9 @@ pydantic = ["pydantic"]
 [tool.setuptools.packages.find]
 where = ["src"]
+[tool.setuptools.package-data]
+"avtomatika_worker" = ["py.typed"]
 [tool.pytest.ini_options]
 markers = [
     "e2e: marks tests as end-to-end tests",

{avtomatika_worker-1.0b3 → avtomatika_worker-1.0b5}/src/avtomatika_worker/__init__.py RENAMED Viewed

@@ -1,4 +1,4 @@
-"""A Python SDK for creating workers for the Py-Orchestrator."""
+"""A Python SDK for creating workers for the Avtomatika Orchestrator."""
 from importlib.metadata import PackageNotFoundError, version

{avtomatika_worker-1.0b3 → avtomatika_worker-1.0b5}/src/avtomatika_worker/config.py RENAMED Viewed

@@ -4,6 +4,8 @@ from os import getenv
 from typing import Any
 from uuid import uuid4
+from rxon.validators import validate_identifier
 class WorkerConfig:
     """A class for centralized management of worker configuration.
@@ -29,6 +31,9 @@ class WorkerConfig:
             "WORKER_INDIVIDUAL_TOKEN",
             getenv("WORKER_TOKEN", "your-secret-worker-token"),
         )
+        self.TLS_CA_PATH: str | None = getenv("TLS_CA_PATH")
+        self.TLS_CERT_PATH: str | None = getenv("TLS_CERT_PATH")
+        self.TLS_KEY_PATH: str | None = getenv("TLS_KEY_PATH")
         # --- Resources and performance ---
         self.COST_PER_SKILL: dict[str, float] = self._load_json_from_env("COST_PER_SKILL", default={})
@@ -73,6 +78,7 @@ class WorkerConfig:
     def validate(self) -> None:
         """Validates critical configuration parameters."""
+        validate_identifier(self.WORKER_ID, "WORKER_ID")
         if self.WORKER_TOKEN == "your-secret-worker-token":
             print("Warning: WORKER_TOKEN is set to the default value. Tasks might fail authentication.")

{avtomatika_worker-1.0b3 → avtomatika_worker-1.0b5}/src/avtomatika_worker/s3.py RENAMED Viewed

@@ -4,13 +4,17 @@ from os import walk
 from os.path import basename, dirname, join, relpath
 from shutil import rmtree
 from typing import Any, cast
-from urllib.parse import urlparse
-import obstore
 from aiofiles import open as aio_open
 from aiofiles.os import makedirs
-from aiofiles.ospath import exists, isdir
+from aiofiles.ospath import exists, getsize, isdir
+from obstore import get as obstore_get
+from obstore import list as obstore_list
+from obstore import put as obstore_put
 from obstore.store import S3Store
+from rxon.blob import parse_uri
+from rxon.exceptions import IntegrityError
+from rxon.models import FileMetadata
 from .config import WorkerConfig
@@ -61,12 +65,12 @@ class S3Manager:
         if await exists(task_dir):
             await to_thread(lambda: rmtree(task_dir, ignore_errors=True))
-    async def _process_s3_uri(self, uri: str, task_id: str) -> str:
-        """Downloads a file or a folder (if uri ends with /) from S3 and returns the local path."""
+    async def _process_s3_uri(self, uri: str, task_id: str, verify_meta: FileMetadata | None = None) -> str:
+        """Downloads a file or a folder from S3 and returns the local path.
+        If verify_meta is provided, performs integrity checks.
+        """
         try:
-            parsed_url = urlparse(uri)
-            bucket_name = parsed_url.netloc
-            object_key = parsed_url.path.lstrip("/")
+            bucket_name, object_key, is_directory = parse_uri(uri)
             store = self._get_store(bucket_name)
             # Use task-specific directory for isolation
@@ -76,36 +80,27 @@ class S3Manager:
             logger.info(f"Starting download from S3: {uri}")
             # Handle folder download (prefix)
-            if uri.endswith("/"):
+            if is_directory:
                 folder_name = object_key.rstrip("/").split("/")[-1]
                 local_folder_path = join(local_dir_root, folder_name)
-                # List objects with prefix
-                # obstore.list returns an async iterator of ObjectMeta
                 files_to_download = []
-                # Note: obstore.list returns an async iterator.
-                async for obj in obstore.list(store, prefix=object_key):
+                async for obj in obstore_list(store, prefix=object_key):
                     key = obj.key
                     if key.endswith("/"):
                         continue
-                    # Calculate relative path inside the folder
                     rel_path = key[len(object_key) :]
                     local_file_path = join(local_folder_path, rel_path)
                     await makedirs(dirname(local_file_path), exist_ok=True)
                     files_to_download.append((key, local_file_path))
                 async def _download_file(key: str, path: str) -> None:
                     async with self._semaphore:
-                        result = await obstore.get(store, key)
+                        result = await obstore_get(store, key)
                         async with aio_open(path, "wb") as f:
                             async for chunk in result.stream():
                                 await f.write(chunk)
-                # Execute downloads in parallel
                 if files_to_download:
                     await gather(*[_download_file(k, p) for k, p in files_to_download])
@@ -115,7 +110,20 @@ class S3Manager:
             # Handle single file download
             local_path = join(local_dir_root, basename(object_key))
-            result = await obstore.get(store, object_key)
+            result = await obstore_get(store, object_key)
+            # Integrity check before download
+            if verify_meta:
+                if verify_meta.size != result.meta.size:
+                    raise IntegrityError(
+                        f"Size mismatch for {uri}: expected {verify_meta.size}, got {result.meta.size}"
+                    )
+                if verify_meta.etag and result.meta.e_tag:
+                    actual_etag = result.meta.e_tag.strip('"')
+                    expected_etag = verify_meta.etag.strip('"')
+                    if actual_etag != expected_etag:
+                        raise IntegrityError(f"ETag mismatch for {uri}: expected {expected_etag}, got {actual_etag}")
             async with aio_open(local_path, "wb") as f:
                 async for chunk in result.stream():
                     await f.write(chunk)
@@ -128,8 +136,8 @@ class S3Manager:
             logger.exception(f"Error during download of {uri}: {e}")
             raise
-    async def _upload_to_s3(self, local_path: str) -> str:
-        """Uploads a file or a folder to S3 and returns the S3 URI."""
+    async def _upload_to_s3(self, local_path: str) -> FileMetadata:
+        """Uploads a file or a folder to S3 and returns FileMetadata."""
         bucket_name = self._config.S3_DEFAULT_BUCKET
         store = self._get_store(bucket_name)
@@ -141,70 +149,90 @@ class S3Manager:
                 folder_name = basename(local_path.rstrip("/"))
                 s3_prefix = f"{folder_name}/"
-                # Use to_thread to avoid blocking event loop during file walk
                 def _get_files_to_upload():
+                    from os.path import getsize as std_getsize
                     files_to_upload = []
+                    total_size = 0
                     for root, _, files in walk(local_path):
                         for file in files:
                             f_path = join(root, file)
                             rel = relpath(f_path, local_path)
+                            total_size += std_getsize(f_path)
                             files_to_upload.append((f_path, f"{s3_prefix}{rel}"))
-                    return files_to_upload
+                    return files_to_upload, total_size
-                files_list = await to_thread(_get_files_to_upload)
+                files_list, total_size = await to_thread(_get_files_to_upload)
                 async def _upload_file(path: str, key: str) -> None:
                     async with self._semaphore:
-                        # obstore.put accepts bytes or file-like objects.
-                        # Since we are in async, reading small files is fine.
                         with open(path, "rb") as f:
-                            await obstore.put(store, key, f)
+                            await obstore_put(store, key, f)
                 if files_list:
-                    # Upload in parallel
                     await gather(*[_upload_file(f, k) for f, k in files_list])
                 s3_uri = f"s3://{bucket_name}/{s3_prefix}"
                 logger.info(f"Successfully uploaded folder to S3: {local_path} -> {s3_uri} ({len(files_list)} files)")
-                return s3_uri
+                return FileMetadata(uri=s3_uri, size=total_size)
             # Handle single file upload
             object_key = basename(local_path)
+            file_size = await getsize(local_path)
             with open(local_path, "rb") as f:
-                await obstore.put(store, object_key, f)
+                put_result = await obstore_put(store, object_key, f)
             s3_uri = f"s3://{bucket_name}/{object_key}"
-            logger.info(f"Successfully uploaded file to S3: {local_path} -> {s3_uri}")
-            return s3_uri
+            etag = put_result.e_tag.strip('"') if put_result.e_tag else None
+            logger.info(f"Successfully uploaded file to S3: {local_path} -> {s3_uri} (ETag: {etag})")
+            return FileMetadata(uri=s3_uri, size=file_size, etag=etag)
         except Exception as e:
             logger.exception(f"Error during upload of {local_path}: {e}")
             raise
-    async def process_params(self, params: dict[str, Any], task_id: str) -> dict[str, Any]:
-        """Recursively searches for S3 URIs in params and downloads the files."""
+    async def process_params(
+        self, params: dict[str, Any], task_id: str, metadata: dict[str, FileMetadata] | None = None
+    ) -> dict[str, Any]:
+        """Recursively searches for S3 URIs in params and downloads the files.
+        Uses metadata for integrity verification if available.
+        """
         if not self._config.S3_ENDPOINT_URL:
             return params
-        async def _process(item: Any) -> Any:
+        async def _process(item: Any, key_path: str = "") -> Any:
             if isinstance(item, str) and item.startswith("s3://"):
-                return await self._process_s3_uri(item, task_id)
+                verify_meta = metadata.get(key_path) if metadata else None
+                return await self._process_s3_uri(item, task_id, verify_meta=verify_meta)
             if isinstance(item, dict):
-                return {k: await _process(v) for k, v in item.items()}
-            return [await _process(i) for i in item] if isinstance(item, list) else item
+                return {k: await _process(v, f"{key_path}.{k}" if key_path else k) for k, v in item.items()}
+            if isinstance(item, list):
+                return [await _process(v, f"{key_path}[{i}]") for i, v in enumerate(item)]
+            return item
         return cast(dict[str, Any], await _process(params))
-    async def process_result(self, result: dict[str, Any]) -> dict[str, Any]:
-        """Recursively searches for local file paths in the result and uploads them to S3."""
+    async def process_result(self, result: dict[str, Any]) -> tuple[dict[str, Any], dict[str, FileMetadata]]:
+        """Recursively searches for local file paths in the result and uploads them to S3.
+        Returns a tuple of (updated_result, metadata_map).
+        """
         if not self._config.S3_ENDPOINT_URL:
-            return result
+            return result, {}
+        metadata_map = {}
-        async def _process(item: Any) -> Any:
+        async def _process(item: Any, key_path: str = "") -> Any:
             if isinstance(item, str) and item.startswith(self._config.TASK_FILES_DIR):
-                return await self._upload_to_s3(item) if await exists(item) else item
+                if await exists(item):
+                    meta = await self._upload_to_s3(item)
+                    metadata_map[key_path] = meta
+                    return meta.uri
+                return item
             if isinstance(item, dict):
-                return {k: await _process(v) for k, v in item.items()}
-            return [await _process(i) for i in item] if isinstance(item, list) else item
+                return {k: await _process(v, f"{key_path}.{k}" if key_path else k) for k, v in item.items()}
+            if isinstance(item, list):
+                return [await _process(v, f"{key_path}[{i}]") for i, v in enumerate(item)]
+            return item
-        return cast(dict[str, Any], await _process(result))
+        updated_result = cast(dict[str, Any], await _process(result))
+        return updated_result, metadata_map

avtomatika-worker 1.0b3__tar.gz → 1.0b5__tar.gz

avtomatika-worker 1.0b3tar.gz → 1.0b5tar.gz