PyPI - compose-runner - Versions diffs - 0.6.1__tar.gz → 0.6.2rc1__tar.gz - Mend

compose-runner 0.6.1tar.gz → 0.6.2rc1tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (38) hide show

{compose_runner-0.6.1 → compose_runner-0.6.2rc1}/Dockerfile RENAMED Viewed

@@ -14,7 +14,7 @@ RUN hatch dep show requirements > requirements.txt && pip install -r requirement
 COPY . .
-# install the package (more likely to change, leverage caching!)
-RUN pip install .
+# install the package with AWS extras so the ECS task has boto3, etc.
+RUN pip install '.[aws]'
 ENTRYPOINT ["compose-run"]

compose_runner-0.6.2rc1/PKG-INFO ADDED Viewed

@@ -0,0 +1,79 @@
+Metadata-Version: 2.4
+Name: compose-runner
+Version: 0.6.2rc1
+Summary: A package for running neurosynth-compose analyses
+Project-URL: Repository, https://github.com/neurostuff/compose-runner
+Author-email: James Kent <jamesdkent21@gmail.com>
+License: BSD 3-Clause License
+License-File: LICENSE
+Keywords: meta-analysis,neuroimaging,neurosynth,neurosynth-compose
+Classifier: License :: OSI Approved :: BSD License
+Classifier: Programming Language :: Python :: 3
+Requires-Dist: click
+Requires-Dist: nimare
+Requires-Dist: numpy
+Requires-Dist: sentry-sdk
+Provides-Extra: aws
+Requires-Dist: boto3; extra == 'aws'
+Provides-Extra: tests
+Requires-Dist: pytest; extra == 'tests'
+Requires-Dist: pytest-recording; extra == 'tests'
+Requires-Dist: vcrpy; extra == 'tests'
+Description-Content-Type: text/markdown
+# compose-runner
+Python package to execute meta-analyses created using neurosynth compose and NiMARE
+as the meta-analysis execution engine.
+## AWS Deployment
+This repository includes an AWS CDK application that turns compose-runner into a
+serverless batch pipeline using Step Functions, AWS Lambda, and ECS Fargate.
+The deployed architecture works like this:
+- `ComposeRunnerSubmit` (Lambda Function URL) accepts HTTP requests, validates
+  the meta-analysis payload, and starts a Step Functions execution. The response
+  is immediate and returns both a durable `job_id` (the execution ARN) and the
+  `artifact_prefix` used for S3 and log correlation.
+- A Standard state machine runs a single Fargate task (`compose_runner.ecs_task`)
+  and waits for completion. The container downloads inputs, executes the
+  meta-analysis on up to 4 vCPU / 30 GiB of memory, uploads artifacts to S3, and
+  writes `metadata.json` into the same prefix.
+- `ComposeRunnerStatus` (Lambda Function URL) wraps `DescribeExecution`, merges
+  metadata from S3, and exposes a simple status endpoint suitable for polling.
+- `ComposeRunnerLogPoller` streams the ECS CloudWatch Logs for a given `artifact_prefix`,
+  while `ComposeRunnerResultsFetcher` returns presigned URLs for stored artifacts.
+1. Create a virtual environment and install the CDK dependencies:
+   ```bash
+   cd infra/cdk
+   python -m venv .venv
+   source .venv/bin/activate
+   pip install -r requirements.txt
+   ```
+2. (One-time per account/region) bootstrap the CDK environment:
+   ```bash
+   cdk bootstrap
+   ```
+3. Deploy the stack (supplying the compose-runner version you want baked into the images):
+   ```bash
+   cdk deploy \
+     -c composeRunnerVersion=$(hatch version) \
+     -c resultsPrefix=compose-runner/results \
+     -c taskCpu=4096 \
+     -c taskMemoryMiB=30720
+   ```
+   Pass `-c resultsBucketName=<bucket>` to use an existing S3 bucket, or omit it
+   to let the stack create and retain a dedicated bucket. Additional knobs:
+   - `-c stateMachineTimeoutSeconds=7200` to control the max wall clock per run
+   - `-c submitTimeoutSeconds` / `-c statusTimeoutSeconds` / `-c pollTimeoutSeconds`
+     to tune Lambda timeouts
+   - `-c taskEphemeralStorageGiB` if the default 21 GiB scratch volume is insufficient
+The deployment builds both the Lambda image (`aws_lambda/Dockerfile`) and the
+Fargate task image (`Dockerfile`), provisions the Step Functions state machine,
+and configures a public VPC so each task has outbound internet access.
+The CloudFormation outputs list the HTTPS endpoints for submission, status,
+logs, and artifact retrieval, alongside the Step Functions ARN.

compose_runner-0.6.2rc1/README.md ADDED Viewed

@@ -0,0 +1,56 @@
+# compose-runner
+Python package to execute meta-analyses created using neurosynth compose and NiMARE
+as the meta-analysis execution engine.
+## AWS Deployment
+This repository includes an AWS CDK application that turns compose-runner into a
+serverless batch pipeline using Step Functions, AWS Lambda, and ECS Fargate.
+The deployed architecture works like this:
+- `ComposeRunnerSubmit` (Lambda Function URL) accepts HTTP requests, validates
+  the meta-analysis payload, and starts a Step Functions execution. The response
+  is immediate and returns both a durable `job_id` (the execution ARN) and the
+  `artifact_prefix` used for S3 and log correlation.
+- A Standard state machine runs a single Fargate task (`compose_runner.ecs_task`)
+  and waits for completion. The container downloads inputs, executes the
+  meta-analysis on up to 4 vCPU / 30 GiB of memory, uploads artifacts to S3, and
+  writes `metadata.json` into the same prefix.
+- `ComposeRunnerStatus` (Lambda Function URL) wraps `DescribeExecution`, merges
+  metadata from S3, and exposes a simple status endpoint suitable for polling.
+- `ComposeRunnerLogPoller` streams the ECS CloudWatch Logs for a given `artifact_prefix`,
+  while `ComposeRunnerResultsFetcher` returns presigned URLs for stored artifacts.
+1. Create a virtual environment and install the CDK dependencies:
+   ```bash
+   cd infra/cdk
+   python -m venv .venv
+   source .venv/bin/activate
+   pip install -r requirements.txt
+   ```
+2. (One-time per account/region) bootstrap the CDK environment:
+   ```bash
+   cdk bootstrap
+   ```
+3. Deploy the stack (supplying the compose-runner version you want baked into the images):
+   ```bash
+   cdk deploy \
+     -c composeRunnerVersion=$(hatch version) \
+     -c resultsPrefix=compose-runner/results \
+     -c taskCpu=4096 \
+     -c taskMemoryMiB=30720
+   ```
+   Pass `-c resultsBucketName=<bucket>` to use an existing S3 bucket, or omit it
+   to let the stack create and retain a dedicated bucket. Additional knobs:
+   - `-c stateMachineTimeoutSeconds=7200` to control the max wall clock per run
+   - `-c submitTimeoutSeconds` / `-c statusTimeoutSeconds` / `-c pollTimeoutSeconds`
+     to tune Lambda timeouts
+   - `-c taskEphemeralStorageGiB` if the default 21 GiB scratch volume is insufficient
+The deployment builds both the Lambda image (`aws_lambda/Dockerfile`) and the
+Fargate task image (`Dockerfile`), provisions the Step Functions state machine,
+and configures a public VPC so each task has outbound internet access.
+The CloudFormation outputs list the HTTPS endpoints for submission, status,
+logs, and artifact retrieval, alongside the Step Functions ARN.

{compose_runner-0.6.1 → compose_runner-0.6.2rc1}/compose_runner/_version.py RENAMED Viewed

@@ -28,7 +28,7 @@ version_tuple: VERSION_TUPLE
 commit_id: COMMIT_ID
 __commit_id__: COMMIT_ID
-__version__ = version = '0.6.1'
-__version_tuple__ = version_tuple = (0, 6, 1)
+__version__ = version = '0.6.2rc1'
+__version_tuple__ = version_tuple = (0, 6, 2, 'rc1')
 __commit_id__ = commit_id = None

compose_runner-0.6.2rc1/compose_runner/aws_lambda/common.py ADDED Viewed

@@ -0,0 +1,60 @@
+from __future__ import annotations
+import base64
+import json
+from dataclasses import dataclass
+from typing import Any, Dict, Optional
+def is_http_event(event: Any) -> bool:
+    return isinstance(event, dict) and "requestContext" in event
+def _decode_body(event: Dict[str, Any]) -> Optional[str]:
+    body = event.get("body")
+    if not body:
+        return None
+    if event.get("isBase64Encoded"):
+        return base64.b64decode(body).decode("utf-8")
+    return body
+def extract_payload(event: Dict[str, Any]) -> Dict[str, Any]:
+    if not is_http_event(event):
+        return event
+    body = _decode_body(event)
+    if not body:
+        return {}
+    return json.loads(body)
+def http_response(body: Dict[str, Any], status_code: int = 200) -> Dict[str, Any]:
+    return {
+        "statusCode": status_code,
+        "headers": {"Content-Type": "application/json"},
+        "body": json.dumps(body),
+    }
+@dataclass(frozen=True)
+class LambdaRequest:
+    raw_event: Any
+    payload: Dict[str, Any]
+    is_http: bool
+    @classmethod
+    def parse(cls, event: Any) -> "LambdaRequest":
+        payload = extract_payload(event)
+        return cls(raw_event=event, payload=payload, is_http=is_http_event(event))
+    def respond(self, body: Dict[str, Any], status_code: int = 200) -> Dict[str, Any]:
+        if self.is_http:
+            return http_response(body, status_code)
+        return body
+    def bad_request(self, message: str, status_code: int = 400) -> Dict[str, Any]:
+        return self.respond({"status": "FAILED", "error": message}, status_code=status_code)
+    def get(self, key: str, default: Any = None) -> Any:
+        return self.payload.get(key, default)

{compose_runner-0.6.1 → compose_runner-0.6.2rc1}/compose_runner/aws_lambda/log_poll_handler.py RENAMED Viewed

@@ -2,52 +2,30 @@ from __future__ import annotations
 import os
 import time
-import base64
-import json
 from typing import Any, Dict, List
 import boto3
+from compose_runner.aws_lambda.common import LambdaRequest
 _LOGS_CLIENT = boto3.client("logs", region_name=os.environ.get("AWS_REGION", "us-east-1"))
 LOG_GROUP_ENV = "RUNNER_LOG_GROUP"
 DEFAULT_LOOKBACK_MS_ENV = "DEFAULT_LOOKBACK_MS"
-def _is_http_event(event: Any) -> bool:
-    return isinstance(event, dict) and "requestContext" in event
-def _extract_payload(event: Dict[str, Any]) -> Dict[str, Any]:
-    if not _is_http_event(event):
-        return event
-    body = event.get("body")
-    if not body:
-        return {}
-    if event.get("isBase64Encoded"):
-        body = base64.b64decode(body).decode("utf-8")
-    return json.loads(body)
-def _http_response(body: Dict[str, Any], status_code: int = 200) -> Dict[str, Any]:
-    return {
-        "statusCode": status_code,
-        "headers": {"Content-Type": "application/json"},
-        "body": json.dumps(body),
-    }
 def handler(event: Dict[str, Any], context: Any) -> Dict[str, Any]:
-    raw_event = event
-    event = _extract_payload(event)
-    job_id = event.get("job_id")
-    if not job_id:
-        message = "Request payload must include 'job_id'."
-        if _is_http_event(raw_event):
-            return _http_response({"status": "FAILED", "error": message}, status_code=400)
+    request = LambdaRequest.parse(event)
+    payload = request.payload
+    artifact_prefix = payload.get("artifact_prefix")
+    if not artifact_prefix:
+        message = "Request payload must include 'artifact_prefix'."
+        if request.is_http:
+            return request.bad_request(message, status_code=400)
         raise KeyError(message)
-    next_token = event.get("next_token")
-    start_time = event.get("start_time")
-    end_time = event.get("end_time")
+    next_token = payload.get("next_token")
+    start_time = payload.get("start_time")
+    end_time = payload.get("end_time")
     log_group = os.environ[LOG_GROUP_ENV]
     lookback_ms = int(os.environ.get(DEFAULT_LOOKBACK_MS_ENV, "3600000"))
@@ -60,7 +38,7 @@ def handler(event: Dict[str, Any], context: Any) -> Dict[str, Any]:
     params: Dict[str, Any] = {
         "logGroupName": log_group,
-        "filterPattern": f'"{job_id}"',
+        "filterPattern": f'"{artifact_prefix}"',
         "startTime": int(start_time),
     }
     if end_time is not None:
@@ -75,10 +53,8 @@ def handler(event: Dict[str, Any], context: Any) -> Dict[str, Any]:
     ]
     body = {
-        "job_id": job_id,
+        "artifact_prefix": artifact_prefix,
         "events": events,
         "next_token": response.get("nextToken"),
     }
-    if _is_http_event(raw_event):
-        return _http_response(body)
-    return body
+    return request.respond(body)

{compose_runner-0.6.1 → compose_runner-0.6.2rc1}/compose_runner/aws_lambda/results_handler.py RENAMED Viewed

@@ -1,13 +1,13 @@
 from __future__ import annotations
 import os
-import base64
-import json
 from datetime import datetime, timezone
 from typing import Any, Dict, List
 import boto3
+from compose_runner.aws_lambda.common import LambdaRequest
 _S3 = boto3.client("s3", region_name=os.environ.get("AWS_REGION", "us-east-1"))
 RESULTS_BUCKET_ENV = "RESULTS_BUCKET"
@@ -21,44 +21,21 @@ def _serialize_dt(value: datetime) -> str:
     return value.astimezone(timezone.utc).isoformat()
-def _is_http_event(event: Any) -> bool:
-    return isinstance(event, dict) and "requestContext" in event
-def _extract_payload(event: Dict[str, Any]) -> Dict[str, Any]:
-    if not _is_http_event(event):
-        return event
-    body = event.get("body")
-    if not body:
-        return {}
-    if event.get("isBase64Encoded"):
-        body = base64.b64decode(body).decode("utf-8")
-    return json.loads(body)
-def _http_response(body: Dict[str, Any], status_code: int = 200) -> Dict[str, Any]:
-    return {
-        "statusCode": status_code,
-        "headers": {"Content-Type": "application/json"},
-        "body": json.dumps(body),
-    }
 def handler(event: Dict[str, Any], context: Any) -> Dict[str, Any]:
-    raw_event = event
-    event = _extract_payload(event)
+    request = LambdaRequest.parse(event)
+    payload = request.payload
     bucket = os.environ[RESULTS_BUCKET_ENV]
     prefix = os.environ.get(RESULTS_PREFIX_ENV)
-    job_id = event.get("job_id")
-    if not job_id:
-        message = "Request payload must include 'job_id'."
-        if _is_http_event(raw_event):
-            return _http_response({"status": "FAILED", "error": message}, status_code=400)
+    artifact_prefix = payload.get("artifact_prefix")
+    if not artifact_prefix:
+        message = "Request payload must include 'artifact_prefix'."
+        if request.is_http:
+            return request.bad_request(message, status_code=400)
         raise KeyError(message)
-    expires_in = int(event.get("expires_in", DEFAULT_EXPIRES_IN))
+    expires_in = int(payload.get("expires_in", DEFAULT_EXPIRES_IN))
-    key_prefix = f"{prefix.rstrip('/')}/{job_id}" if prefix else job_id
+    key_prefix = f"{prefix.rstrip('/')}/{artifact_prefix}" if prefix else artifact_prefix
     response = _S3.list_objects_v2(Bucket=bucket, Prefix=key_prefix)
     contents = response.get("Contents", [])
@@ -84,11 +61,9 @@ def handler(event: Dict[str, Any], context: Any) -> Dict[str, Any]:
         )
     body = {
-        "job_id": job_id,
+        "artifact_prefix": artifact_prefix,
         "artifacts": artifacts,
         "bucket": bucket,
         "prefix": key_prefix,
     }
-    if _is_http_event(raw_event):
-        return _http_response(body)
-    return body
+    return request.respond(body)

compose_runner-0.6.2rc1/compose_runner/aws_lambda/run_handler.py ADDED Viewed

@@ -0,0 +1,115 @@
+from __future__ import annotations
+import json
+import logging
+import os
+import uuid
+from typing import Any, Dict, Optional
+import boto3
+from botocore.exceptions import ClientError
+from compose_runner.aws_lambda.common import LambdaRequest
+logger = logging.getLogger(__name__)
+logger.setLevel(logging.INFO)
+_SFN_CLIENT = boto3.client("stepfunctions", region_name=os.environ.get("AWS_REGION", "us-east-1"))
+STATE_MACHINE_ARN_ENV = "STATE_MACHINE_ARN"
+RESULTS_BUCKET_ENV = "RESULTS_BUCKET"
+RESULTS_PREFIX_ENV = "RESULTS_PREFIX"
+NSC_KEY_ENV = "NSC_KEY"
+NV_KEY_ENV = "NV_KEY"
+def _log(job_id: str, message: str, **details: Any) -> None:
+    payload = {"job_id": job_id, "message": message, **details}
+    # Ensure consistent JSON logging for ingestion/filtering.
+    logger.info(json.dumps(payload))
+def _job_input(
+    payload: Dict[str, Any],
+    artifact_prefix: str,
+    bucket: Optional[str],
+    prefix: Optional[str],
+    nsc_key: Optional[str],
+    nv_key: Optional[str],
+) -> Dict[str, Any]:
+    no_upload_flag = bool(payload.get("no_upload", False))
+    doc: Dict[str, Any] = {
+        "artifact_prefix": artifact_prefix,
+        "meta_analysis_id": payload["meta_analysis_id"],
+        "environment": payload.get("environment", "production"),
+        "no_upload": "true" if no_upload_flag else "false",
+        "results": {"bucket": bucket or "", "prefix": prefix or ""},
+    }
+    n_cores = payload.get("n_cores")
+    doc["n_cores"] = str(n_cores) if n_cores is not None else ""
+    if nsc_key is not None:
+        doc["nsc_key"] = nsc_key
+    else:
+        doc["nsc_key"] = ""
+    if nv_key is not None:
+        doc["nv_key"] = nv_key
+    else:
+        doc["nv_key"] = ""
+    return doc
+def handler(event: Dict[str, Any], context: Any) -> Dict[str, Any]:
+    request = LambdaRequest.parse(event)
+    payload = request.payload
+    if STATE_MACHINE_ARN_ENV not in os.environ:
+        raise RuntimeError(f"{STATE_MACHINE_ARN_ENV} environment variable must be set.")
+    if "meta_analysis_id" not in payload:
+        message = "Request payload must include 'meta_analysis_id'."
+        if request.is_http:
+            return request.bad_request(message, status_code=400)
+        raise KeyError(message)
+    artifact_prefix = payload.get("artifact_prefix") or str(uuid.uuid4())
+    bucket = os.environ.get(RESULTS_BUCKET_ENV)
+    prefix = os.environ.get(RESULTS_PREFIX_ENV)
+    nsc_key = payload.get("nsc_key") or os.environ.get(NSC_KEY_ENV)
+    nv_key = payload.get("nv_key") or os.environ.get(NV_KEY_ENV)
+    job_input = _job_input(payload, artifact_prefix, bucket, prefix, nsc_key, nv_key)
+    params = {
+        "stateMachineArn": os.environ[STATE_MACHINE_ARN_ENV],
+        "name": artifact_prefix,
+        "input": json.dumps(job_input),
+    }
+    try:
+        response = _SFN_CLIENT.start_execution(**params)
+    except _SFN_CLIENT.exceptions.ExecutionAlreadyExists as exc:
+        _log(artifact_prefix, "workflow.duplicate", error=str(exc))
+        body = {
+            "status": "FAILED",
+            "error": "A job with the provided artifact_prefix already exists.",
+            "artifact_prefix": artifact_prefix,
+        }
+        if request.is_http:
+            return request.respond(body, status_code=409)
+        raise ValueError(body["error"]) from exc
+    except ClientError as exc:
+        _log(artifact_prefix, "workflow.failed_to_queue", error=str(exc))
+        message = "Failed to start compose-runner job."
+        body = {"status": "FAILED", "error": message}
+        if request.is_http:
+            return request.respond(body, status_code=500)
+        raise RuntimeError(message) from exc
+    execution_arn = response["executionArn"]
+    _log(artifact_prefix, "workflow.queued", execution_arn=execution_arn)
+    body = {
+        "job_id": execution_arn,
+        "artifact_prefix": artifact_prefix,
+        "status": "SUBMITTED",
+        "status_url": f"/jobs/{execution_arn}",
+    }
+    return request.respond(body, status_code=202)

compose_runner-0.6.2rc1/compose_runner/aws_lambda/status_handler.py ADDED Viewed

@@ -0,0 +1,102 @@
+from __future__ import annotations
+import json
+import os
+from datetime import datetime
+from typing import Any, Dict, Optional
+import boto3
+from botocore.exceptions import ClientError
+from compose_runner.aws_lambda.common import LambdaRequest
+_SFN = boto3.client("stepfunctions", region_name=os.environ.get("AWS_REGION", "us-east-1"))
+_S3 = boto3.client("s3", region_name=os.environ.get("AWS_REGION", "us-east-1"))
+RESULTS_BUCKET_ENV = "RESULTS_BUCKET"
+RESULTS_PREFIX_ENV = "RESULTS_PREFIX"
+METADATA_FILENAME = "metadata.json"
+def _serialize_dt(value: datetime) -> str:
+    return value.astimezone().isoformat()
+def _metadata_key(prefix: Optional[str], artifact_prefix: str) -> str:
+    if prefix:
+        return f"{prefix.rstrip('/')}/{artifact_prefix}/{METADATA_FILENAME}"
+    return f"{artifact_prefix}/{METADATA_FILENAME}"
+def _load_metadata(bucket: str, prefix: Optional[str], artifact_prefix: str) -> Optional[Dict[str, Any]]:
+    key = _metadata_key(prefix, artifact_prefix)
+    try:
+        response = _S3.get_object(Bucket=bucket, Key=key)
+    except ClientError as error:
+        if error.response["Error"]["Code"] in {"NoSuchKey", "404"}:
+            return None
+        raise
+    data = response["Body"].read()
+    return json.loads(data.decode("utf-8"))
+def _parse_output(output: Optional[str]) -> Dict[str, Any]:
+    if not output:
+        return {}
+    try:
+        return json.loads(output)
+    except json.JSONDecodeError:
+        return {"raw_output": output}
+def handler(event: Dict[str, Any], context: Any) -> Dict[str, Any]:
+    request = LambdaRequest.parse(event)
+    payload = request.payload
+    job_id = payload.get("job_id")
+    if not job_id:
+        message = "Request payload must include 'job_id'."
+        if request.is_http:
+            return request.bad_request(message, status_code=400)
+        raise KeyError(message)
+    try:
+        description = _SFN.describe_execution(executionArn=job_id)
+    except ClientError as error:
+        body = {"status": "FAILED", "error": error.response["Error"]["Message"]}
+        if request.is_http:
+            status_code = 404 if error.response["Error"]["Code"] == "ExecutionDoesNotExist" else 500
+            return request.respond(body, status_code=status_code)
+        raise
+    status = description["status"]
+    body: Dict[str, Any] = {
+        "job_id": job_id,
+        "status": status,
+        "start_time": _serialize_dt(description["startDate"]),
+    }
+    if "stopDate" in description:
+        body["stop_time"] = _serialize_dt(description["stopDate"])
+    output_doc = _parse_output(description.get("output"))
+    body["output"] = output_doc
+    artifact_prefix = description.get("name")
+    if not artifact_prefix:
+        raise ValueError("Execution does not expose a name; cannot determine artifact prefix.")
+    body["artifact_prefix"] = artifact_prefix
+    if status in {"SUCCEEDED", "FAILED"}:
+        results_info = output_doc.get("results") or {}
+        bucket = results_info.get("bucket") or os.environ.get(RESULTS_BUCKET_ENV)
+        prefix = results_info.get("prefix") or os.environ.get(RESULTS_PREFIX_ENV)
+        if bucket and artifact_prefix:
+            metadata = _load_metadata(bucket, prefix, artifact_prefix)
+            if metadata:
+                body["result"] = metadata
+        if status == "FAILED":
+            body["error"] = output_doc.get("error")
+    return request.respond(body)

compose-runner 0.6.1__tar.gz → 0.6.2rc1__tar.gz

compose-runner 0.6.1tar.gz → 0.6.2rc1tar.gz