PyPI - avtomatika - Versions diffs - 1.0b7__tar.gz → 1.0b8__tar.gz - Mend

avtomatika 1.0b7tar.gz → 1.0b8tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (90) hide show

{avtomatika-1.0b7 → avtomatika-1.0b8}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: avtomatika
-Version: 1.0b7
+Version: 1.0b8
 Summary: A state-machine based orchestrator for long-running AI and other jobs.
 Project-URL: Homepage, https://github.com/avtomatika-ai/avtomatika
 Project-URL: Bug Tracker, https://github.com/avtomatika-ai/avtomatika/issues
@@ -20,6 +20,9 @@ Requires-Dist: msgpack~=1.1
 Requires-Dist: orjson~=3.11
 Provides-Extra: redis
 Requires-Dist: redis~=7.1; extra == "redis"
+Provides-Extra: s3
+Requires-Dist: obstore>=0.2; extra == "s3"
+Requires-Dist: aiofiles~=23.2; extra == "s3"
 Provides-Extra: history
 Requires-Dist: aiosqlite~=0.22; extra == "history"
 Requires-Dist: asyncpg~=0.30; extra == "history"
@@ -37,10 +40,13 @@ Requires-Dist: pytest-mock~=3.14; extra == "test"
 Requires-Dist: aioresponses~=0.7; extra == "test"
 Requires-Dist: backports.zstd~=1.2; extra == "test"
 Requires-Dist: opentelemetry-instrumentation-aiohttp-client; extra == "test"
+Requires-Dist: obstore>=0.2; extra == "test"
+Requires-Dist: aiofiles~=23.2; extra == "test"
 Provides-Extra: all
 Requires-Dist: avtomatika[redis]; extra == "all"
 Requires-Dist: avtomatika[history]; extra == "all"
 Requires-Dist: avtomatika[telemetry]; extra == "all"
+Requires-Dist: avtomatika[s3]; extra == "all"
 Dynamic: license-file
 # Avtomatika Orchestrator
@@ -60,6 +66,7 @@ This document serves as a comprehensive guide for developers looking to build pi
   - [Parallel Execution and Aggregation (Fan-out/Fan-in)](#parallel-execution-and-aggregation-fan-outfan-in)
   - [Dependency Injection (DataStore)](#dependency-injection-datastore)
   - [Native Scheduler](#native-scheduler)
+  - [S3 Payload Offloading](#s3-payload-offloading)
   - [Webhook Notifications](#webhook-notifications)
 - [Production Configuration](#production-configuration)
   - [Fault Tolerance](#fault-tolerance)
@@ -107,6 +114,11 @@ Avtomatika is part of a larger ecosystem:
     pip install "avtomatika[telemetry]"
     ```
+*   **Install with S3 support (Payload Offloading):**
+    ```bash
+    pip install "avtomatika[s3]"
+    ```
 *   **Install all dependencies, including for testing:**
     ```bash
     pip install "avtomatika[all,test]"
@@ -250,6 +262,19 @@ async def publish_handler_old_style(context):
     print(f"Job {context.job_id}: Publishing video at {output_path} ({duration}s).")
     context.actions.transition_to("complete")
 ```
+## Key Concepts: JobContext and Actions
+### High Performance Architecture
+Avtomatika is engineered for high-load environments with thousands of concurrent workers.
+*   **O(1) Dispatcher**: Uses advanced Redis Set intersections to find suitable workers instantly, regardless of the cluster size. No O(N) scanning.
+*   **Non-Blocking I/O**:
+    *   **Webhooks**: Sent via a bounded background queue to prevent backpressure.
+    *   **History Logging**: Writes to SQL databases are buffered and asynchronous, ensuring the main execution loop never blocks.
+    *   **Redis Streams**: Uses blocking reads to eliminate busy-waiting and reduce CPU usage.
+*   **Memory Safety**: S3 file transfers use streaming to handle multi-gigabyte files with constant, low RAM usage.
 ## Blueprint Cookbook: Key Features
 ### 1. Conditional Transitions (`.when()`)
@@ -365,7 +390,30 @@ daily_at = "02:00"
 The orchestrator can send asynchronous notifications to an external system when a job completes, fails, or is quarantined. This eliminates the need for clients to constantly poll the API for status updates.
-*   **Usage:** Pass a `webhook_url` in the request body when creating a job.
+### 7. S3 Payload Offloading
+Orchestrator provides first-class support for handling large files via S3-compatible storage, powered by the high-performance `obstore` library (Rust bindings).
+*   **Memory Safe (Streaming)**: Uses streaming for uploads and downloads, allowing processing of files larger than available RAM without OOM errors.
+*   **Managed Mode**: The Orchestrator manages file lifecycle (automatic cleanup of S3 objects and local temporary files on job completion).
+*   **Dependency Injection**: Use the `task_files` argument in your handlers to easily read/write data.
+*   **Directory Support**: Supports recursive download and upload of entire directories.
+```python
+@bp.handler_for("process_data")
+async def process_data(task_files, actions):
+    # Streaming download of a large file
+    local_path = await task_files.download("large_dataset.csv")
+    # ... process data ...
+    # Upload results
+    await task_files.write_json("results.json", {"status": "done"})
+    actions.transition_to("finished")
+```
+## Production Configuration
 *   **Events:**
     *   `job_finished`: The job reached a final success state.
     *   `job_failed`: The job failed (e.g., due to an error or invalid input).

avtomatika-1.0b7/src/avtomatika.egg-info/PKG-INFO → avtomatika-1.0b8/README.md RENAMED Viewed

@@ -1,48 +1,3 @@
-Metadata-Version: 2.4
-Name: avtomatika
-Version: 1.0b7
-Summary: A state-machine based orchestrator for long-running AI and other jobs.
-Project-URL: Homepage, https://github.com/avtomatika-ai/avtomatika
-Project-URL: Bug Tracker, https://github.com/avtomatika-ai/avtomatika/issues
-Classifier: Development Status :: 4 - Beta
-Classifier: Programming Language :: Python :: 3
-Classifier: License :: OSI Approved :: MIT License
-Classifier: Operating System :: OS Independent
-Requires-Python: >=3.11
-Description-Content-Type: text/markdown
-License-File: LICENSE
-Requires-Dist: aiohttp~=3.12
-Requires-Dist: python-json-logger~=4.0
-Requires-Dist: graphviz~=0.21
-Requires-Dist: zstandard~=0.24
-Requires-Dist: aioprometheus~=23.12
-Requires-Dist: msgpack~=1.1
-Requires-Dist: orjson~=3.11
-Provides-Extra: redis
-Requires-Dist: redis~=7.1; extra == "redis"
-Provides-Extra: history
-Requires-Dist: aiosqlite~=0.22; extra == "history"
-Requires-Dist: asyncpg~=0.30; extra == "history"
-Provides-Extra: telemetry
-Requires-Dist: opentelemetry-api~=1.39; extra == "telemetry"
-Requires-Dist: opentelemetry-sdk~=1.39; extra == "telemetry"
-Requires-Dist: opentelemetry-exporter-otlp~=1.39; extra == "telemetry"
-Requires-Dist: opentelemetry-instrumentation-aiohttp-client~=0.59b0; extra == "telemetry"
-Provides-Extra: test
-Requires-Dist: pytest~=9.0; extra == "test"
-Requires-Dist: pytest-asyncio~=1.1; extra == "test"
-Requires-Dist: fakeredis~=2.33; extra == "test"
-Requires-Dist: pytest-aiohttp~=1.1; extra == "test"
-Requires-Dist: pytest-mock~=3.14; extra == "test"
-Requires-Dist: aioresponses~=0.7; extra == "test"
-Requires-Dist: backports.zstd~=1.2; extra == "test"
-Requires-Dist: opentelemetry-instrumentation-aiohttp-client; extra == "test"
-Provides-Extra: all
-Requires-Dist: avtomatika[redis]; extra == "all"
-Requires-Dist: avtomatika[history]; extra == "all"
-Requires-Dist: avtomatika[telemetry]; extra == "all"
-Dynamic: license-file
 # Avtomatika Orchestrator
 Avtomatika is a powerful, state-driven engine for managing complex asynchronous workflows in Python. It provides a robust framework for building scalable and resilient applications by separating process logic from execution logic.
@@ -60,6 +15,7 @@ This document serves as a comprehensive guide for developers looking to build pi
   - [Parallel Execution and Aggregation (Fan-out/Fan-in)](#parallel-execution-and-aggregation-fan-outfan-in)
   - [Dependency Injection (DataStore)](#dependency-injection-datastore)
   - [Native Scheduler](#native-scheduler)
+  - [S3 Payload Offloading](#s3-payload-offloading)
   - [Webhook Notifications](#webhook-notifications)
 - [Production Configuration](#production-configuration)
   - [Fault Tolerance](#fault-tolerance)
@@ -107,6 +63,11 @@ Avtomatika is part of a larger ecosystem:
     pip install "avtomatika[telemetry]"
     ```
+*   **Install with S3 support (Payload Offloading):**
+    ```bash
+    pip install "avtomatika[s3]"
+    ```
 *   **Install all dependencies, including for testing:**
     ```bash
     pip install "avtomatika[all,test]"
@@ -250,6 +211,19 @@ async def publish_handler_old_style(context):
     print(f"Job {context.job_id}: Publishing video at {output_path} ({duration}s).")
     context.actions.transition_to("complete")
 ```
+## Key Concepts: JobContext and Actions
+### High Performance Architecture
+Avtomatika is engineered for high-load environments with thousands of concurrent workers.
+*   **O(1) Dispatcher**: Uses advanced Redis Set intersections to find suitable workers instantly, regardless of the cluster size. No O(N) scanning.
+*   **Non-Blocking I/O**:
+    *   **Webhooks**: Sent via a bounded background queue to prevent backpressure.
+    *   **History Logging**: Writes to SQL databases are buffered and asynchronous, ensuring the main execution loop never blocks.
+    *   **Redis Streams**: Uses blocking reads to eliminate busy-waiting and reduce CPU usage.
+*   **Memory Safety**: S3 file transfers use streaming to handle multi-gigabyte files with constant, low RAM usage.
 ## Blueprint Cookbook: Key Features
 ### 1. Conditional Transitions (`.when()`)
@@ -365,7 +339,30 @@ daily_at = "02:00"
 The orchestrator can send asynchronous notifications to an external system when a job completes, fails, or is quarantined. This eliminates the need for clients to constantly poll the API for status updates.
-*   **Usage:** Pass a `webhook_url` in the request body when creating a job.
+### 7. S3 Payload Offloading
+Orchestrator provides first-class support for handling large files via S3-compatible storage, powered by the high-performance `obstore` library (Rust bindings).
+*   **Memory Safe (Streaming)**: Uses streaming for uploads and downloads, allowing processing of files larger than available RAM without OOM errors.
+*   **Managed Mode**: The Orchestrator manages file lifecycle (automatic cleanup of S3 objects and local temporary files on job completion).
+*   **Dependency Injection**: Use the `task_files` argument in your handlers to easily read/write data.
+*   **Directory Support**: Supports recursive download and upload of entire directories.
+```python
+@bp.handler_for("process_data")
+async def process_data(task_files, actions):
+    # Streaming download of a large file
+    local_path = await task_files.download("large_dataset.csv")
+    # ... process data ...
+    # Upload results
+    await task_files.write_json("results.json", {"status": "done"})
+    actions.transition_to("finished")
+```
+## Production Configuration
 *   **Events:**
     *   `job_finished`: The job reached a final success state.
     *   `job_failed`: The job failed (e.g., due to an error or invalid input).
@@ -533,4 +530,4 @@ For a deeper dive into the system, please refer to the following documents:
 - [**Architecture Guide**](https://github.com/avtomatika-ai/avtomatika/blob/main/docs/architecture.md): A detailed overview of the system components and their interactions.
 - [**API Reference**](https://github.com/avtomatika-ai/avtomatika/blob/main/docs/api_reference.md): Full specification of the HTTP API.
 - [**Deployment Guide**](https://github.com/avtomatika-ai/avtomatika/blob/main/docs/deployment.md): Instructions for deploying with Gunicorn/Uvicorn and NGINX.
-- [**Cookbook**](https://github.com/avtomatika-ai/avtomatika/blob/main/docs/cookbook/README.md): Examples and best practices for creating blueprints.
+- [**Cookbook**](https://github.com/avtomatika-ai/avtomatika/blob/main/docs/cookbook/README.md): Examples and best practices for creating blueprints.

{avtomatika-1.0b7 → avtomatika-1.0b8}/pyproject.toml RENAMED Viewed

@@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"
 [project]
 name = "avtomatika"
-version = "1.0b7"
+version = "1.0b8"
 description = "A state-machine based orchestrator for long-running AI and other jobs."
 readme = "README.md"
 requires-python = ">=3.11"
@@ -26,6 +26,7 @@ dependencies = [
 [project.optional-dependencies]
 redis = ["redis~=7.1"]
+s3 = ["obstore>=0.2", "aiofiles~=23.2"]
 history = ["aiosqlite~=0.22", "asyncpg~=0.30"]
 telemetry = [
     "opentelemetry-api~=1.39",
@@ -42,11 +43,14 @@ test = [
     "aioresponses~=0.7",
     "backports.zstd~=1.2",
     "opentelemetry-instrumentation-aiohttp-client",
+    "obstore>=0.2",
+    "aiofiles~=23.2",
 ]
 all = [
     "avtomatika[redis]",
     "avtomatika[history]",
     "avtomatika[telemetry]",
+    "avtomatika[s3]",
 ]
 [project.urls]

{avtomatika-1.0b7 → avtomatika-1.0b8}/src/avtomatika/app_keys.py RENAMED Viewed

@@ -30,3 +30,4 @@ WATCHER_TASK_KEY = AppKey("watcher_task", Task)
 REPUTATION_CALCULATOR_TASK_KEY = AppKey("reputation_calculator_task", Task)
 HEALTH_CHECKER_TASK_KEY = AppKey("health_checker_task", Task)
 SCHEDULER_TASK_KEY = AppKey("scheduler_task", Task)
+S3_SERVICE_KEY = AppKey("s3_service", "S3Service")

{avtomatika-1.0b7 → avtomatika-1.0b8}/src/avtomatika/config.py RENAMED Viewed

@@ -39,6 +39,7 @@ class Config:
         # Worker settings
         self.WORKER_TIMEOUT_SECONDS: int = int(getenv("WORKER_TIMEOUT_SECONDS", 300))
+        self.TASK_FILES_DIR: str = getenv("TASK_FILES_DIR", "/tmp/avtomatika-payloads")
         self.WORKER_POLL_TIMEOUT_SECONDS: int = int(
             getenv("WORKER_POLL_TIMEOUT_SECONDS", 30),
         )
@@ -52,10 +53,19 @@ class Config:
         self.EXECUTOR_MAX_CONCURRENT_JOBS: int = int(
             getenv("EXECUTOR_MAX_CONCURRENT_JOBS", 100),
         )
+        self.REDIS_STREAM_BLOCK_MS: int = int(getenv("REDIS_STREAM_BLOCK_MS", 5000))
         # History storage settings
         self.HISTORY_DATABASE_URI: str = getenv("HISTORY_DATABASE_URI", "")
+        # S3 settings
+        self.S3_ENDPOINT_URL: str = getenv("S3_ENDPOINT_URL", "")
+        self.S3_ACCESS_KEY: str = getenv("S3_ACCESS_KEY", "")
+        self.S3_SECRET_KEY: str = getenv("S3_SECRET_KEY", "")
+        self.S3_REGION: str = getenv("S3_REGION", "us-east-1")
+        self.S3_DEFAULT_BUCKET: str = getenv("S3_DEFAULT_BUCKET", "avtomatika-payloads")
+        self.S3_MAX_CONCURRENCY: int = int(getenv("S3_MAX_CONCURRENCY", 100))
         # Rate limiting settings
         self.RATE_LIMITING_ENABLED: bool = getenv("RATE_LIMITING_ENABLED", "true").lower() == "true"

{avtomatika-1.0b7 → avtomatika-1.0b8}/src/avtomatika/data_types.py RENAMED Viewed

@@ -21,10 +21,11 @@ class JobContext(NamedTuple):
     state_history: dict[str, Any]
     client: ClientConfig
     actions: "ActionFactory"
-    data_stores: dict[str, Any] | None = None
+    data_stores: Any | None = None
     tracing_context: dict[str, Any] | None = None
     aggregation_results: dict[str, Any] | None = None
     webhook_url: str | None = None
+    task_files: Any | None = None
 class GPUInfo(NamedTuple):

{avtomatika-1.0b7 → avtomatika-1.0b8}/src/avtomatika/dispatcher.py RENAMED Viewed

@@ -137,32 +137,17 @@ class Dispatcher:
         dispatch_strategy = task_info.get("dispatch_strategy", "default")
         resource_requirements = task_info.get("resource_requirements")
-        all_workers = await self.storage.get_available_workers()
-        logger.info(f"Found {len(all_workers)} available workers")
-        if not all_workers:
-            raise RuntimeError("No available workers")
-        # A worker is considered available if its status is 'idle' or not specified (for backward compatibility)
-        logger.debug(f"All available workers: {[w['worker_id'] for w in all_workers]}")
-        idle_workers = [w for w in all_workers if w.get("status", "idle") == "idle"]
-        logger.debug(f"Idle workers: {[w['worker_id'] for w in idle_workers]}")
-        if not idle_workers:
-            if busy_mo_workers := [
-                w for w in all_workers if w.get("status") == "busy" and "multi_orchestrator_info" in w
-            ]:
-                logger.warning(
-                    f"No idle workers. Found {len(busy_mo_workers)} busy workers "
-                    f"in multi-orchestrator mode. They are likely performing tasks for other Orchestrators.",
-                )
-            raise RuntimeError("No idle workers (all are 'busy')")
+        candidate_ids = await self.storage.find_workers_for_task(task_type)
+        if not candidate_ids:
+            logger.warning(f"No idle workers found for task '{task_type}'")
+            raise RuntimeError(f"No suitable workers for task type '{task_type}'")
+        capable_workers = await self.storage.get_workers(candidate_ids)
+        logger.debug(f"Found {len(capable_workers)} capable workers for task '{task_type}'")
-        # Filter by task type
-        capable_workers = [w for w in idle_workers if task_type in w.get("supported_tasks", [])]
-        logger.debug(f"Capable workers for task '{task_type}': {[w['worker_id'] for w in capable_workers]}")
         if not capable_workers:
-            raise RuntimeError(f"No suitable workers for task type '{task_type}'")
+            raise RuntimeError(f"No suitable workers for task type '{task_type}' (data missing)")
-        # Filter by resource requirements
         if resource_requirements:
             compliant_workers = [w for w in capable_workers if self._is_worker_compliant(w, resource_requirements)]
             logger.debug(
@@ -175,7 +160,6 @@ class Dispatcher:
                 )
             capable_workers = compliant_workers
-        # Filter by maximum cost
         max_cost = task_info.get("max_cost")
         if max_cost is not None:
             cost_compliant_workers = [w for w in capable_workers if w.get("cost_per_second", float("inf")) <= max_cost]
@@ -188,7 +172,6 @@ class Dispatcher:
                 )
             capable_workers = cost_compliant_workers
-        # Select worker according to strategy
         if dispatch_strategy == "round_robin":
             selected_worker = self._select_round_robin(capable_workers, task_type)
         elif dispatch_strategy == "least_connections":
@@ -205,7 +188,6 @@ class Dispatcher:
             f"Dispatching task '{task_type}' to worker {worker_id} (strategy: {dispatch_strategy})",
         )
-        # --- Task creation and enqueuing ---
         task_id = task_info.get("task_id") or str(uuid4())
         payload = {
             "job_id": job_id,

{avtomatika-1.0b7 → avtomatika-1.0b8}/src/avtomatika/engine.py RENAMED Viewed

@@ -19,6 +19,7 @@ from .app_keys import (
     HTTP_SESSION_KEY,
     REPUTATION_CALCULATOR_KEY,
     REPUTATION_CALCULATOR_TASK_KEY,
+    S3_SERVICE_KEY,
     SCHEDULER_KEY,
     SCHEDULER_TASK_KEY,
     WATCHER_KEY,
@@ -37,6 +38,7 @@ from .history.base import HistoryStorageBase
 from .history.noop import NoOpHistoryStorage
 from .logging_config import setup_logging
 from .reputation import ReputationCalculator
+from .s3 import S3Service
 from .scheduler import Scheduler
 from .storage.base import StorageBackend
 from .telemetry import setup_telemetry
@@ -141,6 +143,11 @@ class OrchestratorEngine:
                 self.history_storage = NoOpHistoryStorage()
     async def on_startup(self, app: web.Application) -> None:
+        # 1. Fail Fast: Check Storage Connection
+        if not await self.storage.ping():
+            logger.critical("Failed to connect to Storage Backend (Redis). Exiting.")
+            raise RuntimeError("Storage Backend is unavailable.")
         try:
             from opentelemetry.instrumentation.aiohttp_client import (
                 AioHttpClientInstrumentor,
@@ -152,6 +159,8 @@ class OrchestratorEngine:
                 "opentelemetry-instrumentation-aiohttp-client not found. AIOHTTP client instrumentation is disabled."
             )
         await self._setup_history_storage()
+        # Start history background worker
+        await self.history_storage.start()
         # Load client configs if the path is provided
         if self.config.CLIENTS_CONFIG_PATH:
@@ -188,6 +197,7 @@ class OrchestratorEngine:
         app[HTTP_SESSION_KEY] = ClientSession()
         self.webhook_sender = WebhookSender(app[HTTP_SESSION_KEY])
+        self.webhook_sender.start()
         self.dispatcher = Dispatcher(self.storage, self.config)
         app[DISPATCHER_KEY] = self.dispatcher
         app[EXECUTOR_KEY] = JobExecutor(self, self.history_storage)
@@ -196,6 +206,7 @@ class OrchestratorEngine:
         app[HEALTH_CHECKER_KEY] = HealthChecker(self)
         app[SCHEDULER_KEY] = Scheduler(self)
         app[WS_MANAGER_KEY] = self.ws_manager
+        app[S3_SERVICE_KEY] = S3Service(self.config, self.history_storage)
         app[EXECUTOR_TASK_KEY] = create_task(app[EXECUTOR_KEY].run())
         app[WATCHER_TASK_KEY] = create_task(app[WATCHER_KEY].run())
@@ -220,6 +231,13 @@ class OrchestratorEngine:
         logger.info("Closing WebSocket connections...")
         await self.ws_manager.close_all()
+        logger.info("Stopping WebhookSender...")
+        await self.webhook_sender.stop()
+        if S3_SERVICE_KEY in app:
+            logger.info("Closing S3 Service...")
+            await app[S3_SERVICE_KEY].close()
         logger.info("Cancelling background tasks...")
         app[HEALTH_CHECKER_TASK_KEY].cancel()
         app[WATCHER_TASK_KEY].cancel()
@@ -352,7 +370,7 @@ class OrchestratorEngine:
         )
         # Run in background to not block the main flow
-        create_task(self.webhook_sender.send(webhook_url, payload))
+        await self.webhook_sender.send(webhook_url, payload)
     def run(self) -> None:
         self.setup()

{avtomatika-1.0b7 → avtomatika-1.0b8}/src/avtomatika/executor.py RENAMED Viewed

@@ -47,6 +47,7 @@ except ImportError:
     inject = NoOpPropagate().inject
     TraceContextTextMapPropagator = NoOpTraceContextTextMapPropagator()  # Instantiate the class
+from .app_keys import S3_SERVICE_KEY
 from .context import ActionFactory
 from .data_types import ClientConfig, JobContext
 from .history.base import HistoryStorageBase
@@ -74,7 +75,7 @@ class JobExecutor:
         self._running = False
         self._processing_messages: set[str] = set()
-    async def _process_job(self, job_id: str, message_id: str):
+    async def _process_job(self, job_id: str, message_id: str) -> None:
         """The core logic for processing a single job dequeued from storage."""
         if message_id in self._processing_messages:
             return
@@ -143,6 +144,11 @@ class JobExecutor:
                     plan=client_config_dict.get("plan", "unknown"),
                     params=client_config_dict.get("params", {}),
                 )
+                # Get TaskFiles if S3 service is available
+                s3_service = self.engine.app.get(S3_SERVICE_KEY)
+                task_files = s3_service.get_task_files(job_id) if s3_service else None
                 context = JobContext(
                     job_id=job_id,
                     current_state=job_state["current_state"],
@@ -153,6 +159,7 @@ class JobExecutor:
                     data_stores=SimpleNamespace(**blueprint.data_stores),
                     tracing_context=tracing_context,
                     aggregation_results=job_state.get("aggregation_results"),
+                    task_files=task_files,
                 )
                 try:
@@ -173,12 +180,17 @@ class JobExecutor:
                         params_to_inject["context"] = context
                         if "actions" in param_names:
                             params_to_inject["actions"] = action_factory
+                        if "task_files" in param_names:
+                            params_to_inject["task_files"] = task_files
                     else:
                         # New injection logic with prioritized lookup.
                         context_as_dict = context._asdict()
                         for param_name in param_names:
+                            # Direct injection of task_files
+                            if param_name == "task_files":
+                                params_to_inject[param_name] = task_files
                             # Look in JobContext fields first.
-                            if param_name in context_as_dict:
+                            elif param_name in context_as_dict:
                                 params_to_inject[param_name] = context_as_dict[param_name]
                             # Then look in state_history (data from previous steps/workers).
                             elif param_name in context.state_history:
@@ -258,6 +270,15 @@ class JobExecutor:
             await self.storage.enqueue_job(job_id)
         else:
             logger.info(f"Job {job_id} reached terminal state {next_state}")
+            # Clean up S3 files if service is available
+            s3_service = self.engine.app.get(S3_SERVICE_KEY)
+            if s3_service:
+                task_files = s3_service.get_task_files(job_id)
+                if task_files:
+                    # Run cleanup in background to not block response
+                    create_task(task_files.cleanup())
             await self._check_and_resume_parent(job_state)
             # Send webhook for finished/failed jobs
             event_type = "job_finished" if next_state == "finished" else "job_failed"
@@ -522,7 +543,10 @@ class JobExecutor:
                 # Wait for an available slot before fetching a new job
                 await semaphore.acquire()
-                result = await self.storage.dequeue_job()
+                # Block for a configured time waiting for a job
+                block_time = self.engine.config.REDIS_STREAM_BLOCK_MS
+                result = await self.storage.dequeue_job(block=block_time if block_time > 0 else None)
                 if result:
                     job_id, message_id = result
                     task = create_task(self._process_job(job_id, message_id))
@@ -530,14 +554,18 @@ class JobExecutor:
                     # Release the semaphore slot when the task is done
                     task.add_done_callback(lambda _: semaphore.release())
                 else:
-                    # No job found, release the slot and wait a bit
+                    # Timeout reached, release slot and loop again
                     semaphore.release()
-                    # Prevent busy loop if storage returns None immediately
-                    await sleep(0.1)
+                    # Prevent busy loop if blocking is disabled (e.g. in tests) or failed
+                    if block_time <= 0:
+                        await sleep(0.1)
             except CancelledError:
                 break
             except Exception:
                 logger.exception("Error in JobExecutor main loop.")
+                # If an error occurred (e.g. Redis connection lost), sleep briefly to avoid log spam
+                semaphore.release()
                 await sleep(1)
         logger.info("JobExecutor stopped.")

avtomatika-1.0b8/src/avtomatika/health_checker.py ADDED Viewed

@@ -0,0 +1,57 @@
+"""This module previously contained an active HealthChecker.
+In the new architecture with heartbeat messages from workers,
+the orchestrator no longer needs to actively poll workers.
+Redis automatically deletes worker keys when their TTL expires,
+and `storage.get_available_workers()` only retrieves active keys.
+This file is left as a placeholder in case passive health-check
+logic is needed in the future (e.g., for logging expired workers).
+"""
+from asyncio import CancelledError, sleep
+from logging import getLogger
+from typing import TYPE_CHECKING
+if TYPE_CHECKING:
+    from .engine import OrchestratorEngine
+logger = getLogger(__name__)
+class HealthChecker:
+    def __init__(self, engine: "OrchestratorEngine", interval_seconds: int = 600):
+        self.engine = engine
+        self.storage = engine.storage
+        self.interval_seconds = interval_seconds
+        self._running = False
+        from uuid import uuid4
+        self._instance_id = str(uuid4())
+    async def run(self):
+        logger.info(f"HealthChecker started (Active Index Cleanup, Instance ID: {self._instance_id}).")
+        self._running = True
+        while self._running:
+            try:
+                # Use distributed lock to ensure only one instance cleans up
+                if await self.storage.acquire_lock(
+                    "global_health_check_lock", self._instance_id, self.interval_seconds - 5
+                ):
+                    try:
+                        await self.storage.cleanup_expired_workers()
+                    finally:
+                        # We don't release the lock immediately to prevent other instances from
+                        # running the same task if the interval is small.
+                        pass
+                await sleep(self.interval_seconds)
+            except CancelledError:
+                break
+            except Exception:
+                logger.exception("Error in HealthChecker main loop.")
+                await sleep(60)
+        logger.info("HealthChecker stopped.")
+    def stop(self):
+        self._running = False

avtomatika 1.0b7__tar.gz → 1.0b8__tar.gz

avtomatika 1.0b7tar.gz → 1.0b8tar.gz