PyPI - ddeutil-workflow - Versions diffs - 0.0.12__tar.gz → 0.0.13__tar.gz - Mend

ddeutil-workflow 0.0.12tar.gz → 0.0.13tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (58) hide show

{ddeutil_workflow-0.0.12/src/ddeutil_workflow.egg-info → ddeutil_workflow-0.0.13}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.1
 Name: ddeutil-workflow
-Version: 0.0.12
+Version: 0.0.13
 Summary: Lightweight workflow orchestration with less dependencies
 Author-email: ddeutils <korawich.anu@gmail.com>
 License: MIT
@@ -18,19 +18,21 @@ Classifier: Programming Language :: Python :: 3.9
 Classifier: Programming Language :: Python :: 3.10
 Classifier: Programming Language :: Python :: 3.11
 Classifier: Programming Language :: Python :: 3.12
+Classifier: Programming Language :: Python :: 3.13
 Requires-Python: >=3.9.13
 Description-Content-Type: text/markdown
 License-File: LICENSE
-Requires-Dist: ddeutil-io
+Requires-Dist: ddeutil-io>=0.1.12
 Requires-Dist: python-dotenv==1.0.1
 Requires-Dist: typer<1.0.0,==0.12.5
 Requires-Dist: schedule<2.0.0,==1.2.2
 Provides-Extra: api
-Requires-Dist: fastapi<1.0.0,==0.112.2; extra == "api"
+Requires-Dist: fastapi<1.0.0,>=0.114.1; extra == "api"
 # Workflow
 [![test](https://github.com/ddeutils/ddeutil-workflow/actions/workflows/tests.yml/badge.svg?branch=main)](https://github.com/ddeutils/ddeutil-workflow/actions/workflows/tests.yml)
+[![pypi version](https://img.shields.io/pypi/v/ddeutil-workflow)](https://pypi.org/project/ddeutil-workflow/)
 [![python support version](https://img.shields.io/pypi/pyversions/ddeutil-workflow)](https://pypi.org/project/ddeutil-workflow/)
 [![size](https://img.shields.io/github/languages/code-size/ddeutils/ddeutil-workflow)](https://github.com/ddeutils/ddeutil-workflow)
 [![gh license](https://img.shields.io/github/license/ddeutils/ddeutil-workflow)](https://github.com/ddeutils/ddeutil-workflow/blob/main/LICENSE)
@@ -41,8 +43,9 @@ for easy to make a simple metadata driven for data workflow orchestration.
 It can to use for data operator by a `.yaml` template.
 > [!WARNING]
-> This package provide only orchestration workload. That mean you should not use
-> workflow stage to process any large data which use lot of compute usecase.
+> This package provide only orchestration workload task. That mean you should not
+> use the workflow stage to process any large volume data which use lot of compute
+> resource. :cold_sweat:
 In my opinion, I think it should not create duplicate workflow codes if I can
 write with dynamic input parameters on the one template workflow that just change
@@ -50,23 +53,25 @@ the input parameters per use-case instead.
 This way I can handle a lot of logical workflows in our orgs with only metadata
 configuration. It called **Metadata Driven Data Workflow**.
-Next, we should get some monitoring tools for manage logging that return from
-workflow running. Because it not show us what is a use-case that running data
-workflow.
+**:pushpin: <u>Rules of This Workflow engine</u>**:
-> [!NOTE]
-> _Disclaimer_: I inspire the dynamic statement from the GitHub Action `.yml` files
-> and all of config file from several data orchestration framework tools from my
-> experience on Data Engineer.
-**Rules of This Workflow engine**:
+1. Minimum frequency unit of scheduling is **1 minute** :warning:
+2. Can not re-run only failed stage and its pending downstream :rotating_light:
+3. All parallel tasks inside workflow engine use Multi-Threading
+   (Because Python 3.13 unlock GIL :unlock:)
-1. Minimum unit of scheduling is 1 minute
-2. Cannot re-run only failed stage and its pending downstream
-3. All parallel tasks inside workflow engine use Threading
-   (Because Python 3.13 unlock GIL)
+> [!NOTE]
+> _Disclaimer_: I inspire the dynamic statement from the [**GitHub Action**](https://github.com/features/actions)
+> `.yml` files and all of config file from several data orchestration framework
+> tools from my experience on Data Engineer. :grimacing:
+>
+> Other workflow that I interest on them and pick some interested feature to this
+> package:
+>
+> - [Google **Workflows**](https://cloud.google.com/workflows)
+> - [AWS **Step Functions**](https://aws.amazon.com/step-functions/)
-## Installation
+## :round_pushpin: Installation
 This project need `ddeutil-io` extension namespace packages. If you want to install
 this package with application add-ons, you should add `app` in installation;
@@ -79,7 +84,7 @@ this package with application add-ons, you should add `app` in installation;
 > I added this feature to the main milestone.
 >
-> **Docker Images** supported:
+> :egg: **Docker Images** supported:
 >
 > | Docker Image                | Python Version | Support |
 > |-----------------------------|----------------|---------|
@@ -88,7 +93,7 @@ this package with application add-ons, you should add `app` in installation;
 > | ddeutil-workflow:python3.11 | `3.11`         | :x:     |
 > | ddeutil-workflow:python3.12 | `3.12`         | :x:     |
-## Usage
+## :beers: Usage
 This is examples that use workflow file for running common Data Engineering
 use-case.
@@ -100,8 +105,10 @@ use-case.
 > maintenance your data workflows.
 ```yaml
-run_py_local:
-   type: Workflow
+run-py-local:
+   # Validate model that use to parsing exists for template file
+   type: ddeutil.workflow.Workflow
    on:
       # If workflow deploy to schedule, it will running every 5 minutes
       # with Asia/Bangkok timezone.
@@ -110,7 +117,7 @@ run_py_local:
    params:
       # Incoming execution parameters will validate with this type. It allow
       # to set default value or templating.
-      author-run: str
+      source-extract: str
       run-date: datetime
    jobs:
       getting-api-data:
@@ -119,17 +126,56 @@ run_py_local:
               id: retrieve-api
               uses: tasks/get-api-with-oauth-to-s3@requests
               with:
-                 url: https://open-data/
-                 auth: ${API_ACCESS_REFRESH_TOKEN}
-                 aws_s3_path: my-data/open-data/
-                 # This Authentication code should implement with your custom hook function.
-                 # The template allow you to use environment variable.
+                 # Arguments of source data that want to retrieve.
+                 method: post
+                 url: https://finances/open-data/currency-pairs/
+                 body:
+                    resource: ${{ params.source-extract }}
+                    # You can able to use filtering like Jinja template but this
+                    # package does not use it.
+                    filter: ${{ params.run-date | fmt(fmt='%Y%m%d') }}
+                 auth:
+                    type: bearer
+                    keys: ${API_ACCESS_REFRESH_TOKEN}
+                 # Arguments of target data that want to landing.
+                 writing_mode: flatten
+                 aws_s3_path: my-data/open-data/${{ params.source-extract }}
+                 # This Authentication code should implement with your custom hook
+                 # function. The template allow you to use environment variable.
                  aws_access_client_id: ${AWS_ACCESS_CLIENT_ID}
                  aws_access_client_secret: ${AWS_ACCESS_CLIENT_SECRET}
 ```
-## Configuration
+The above workflow template is main executor pipeline that you want to do. If you
+want to schedule this workflow, you want to dynamic its parameters change base on
+execution time such as `run-date` should change base on that workflow running date.
+So, this package provide the `Schedule` template for this action.
+```yaml
+schedule-run-local-wf:
+   # Validate model that use to parsing exists for template file
+   type: ddeutil.workflow.scheduler.Schedule
+   workflows:
+      # Map existing workflow that want to deploy with scheduler application.
+      # It allow you to passing release parameter that dynamic change depend the
+      # current context of this scheduler application releasing that time.
+      - name: run-py-local
+        params:
+          source-extract: "USD-THB"
+          asat-dt: "${{ release.logical_date }}"
+```
+## :cookie: Configuration
+The main configuration that use to dynamic changing with your propose of this
+application. If any configuration values do not set yet, it will use default value
+and do not raise any error to you.
 | Environment                         | Component | Default                          | Description                                                                |
 |-------------------------------------|-----------|----------------------------------|----------------------------------------------------------------------------|
@@ -155,7 +201,7 @@ run_py_local:
 | `WORKFLOW_API_ENABLE_ROUTE_WORKFLOW` | API       | true    | A flag that enable workflow route to manage execute manually and workflow logging |
 | `WORKFLOW_API_ENABLE_ROUTE_SCHEDULE` | API       | true    | A flag that enable run scheduler                                                  |
-## Deployment
+## :rocket: Deployment
 This package able to run as a application service for receive manual trigger
 from the master node via RestAPI or use to be Scheduler background service

{ddeutil_workflow-0.0.12 → ddeutil_workflow-0.0.13}/README.md RENAMED Viewed

@@ -1,6 +1,7 @@
 # Workflow
 [![test](https://github.com/ddeutils/ddeutil-workflow/actions/workflows/tests.yml/badge.svg?branch=main)](https://github.com/ddeutils/ddeutil-workflow/actions/workflows/tests.yml)
+[![pypi version](https://img.shields.io/pypi/v/ddeutil-workflow)](https://pypi.org/project/ddeutil-workflow/)
 [![python support version](https://img.shields.io/pypi/pyversions/ddeutil-workflow)](https://pypi.org/project/ddeutil-workflow/)
 [![size](https://img.shields.io/github/languages/code-size/ddeutils/ddeutil-workflow)](https://github.com/ddeutils/ddeutil-workflow)
 [![gh license](https://img.shields.io/github/license/ddeutils/ddeutil-workflow)](https://github.com/ddeutils/ddeutil-workflow/blob/main/LICENSE)
@@ -11,8 +12,9 @@ for easy to make a simple metadata driven for data workflow orchestration.
 It can to use for data operator by a `.yaml` template.
 > [!WARNING]
-> This package provide only orchestration workload. That mean you should not use
-> workflow stage to process any large data which use lot of compute usecase.
+> This package provide only orchestration workload task. That mean you should not
+> use the workflow stage to process any large volume data which use lot of compute
+> resource. :cold_sweat:
 In my opinion, I think it should not create duplicate workflow codes if I can
 write with dynamic input parameters on the one template workflow that just change
@@ -20,23 +22,25 @@ the input parameters per use-case instead.
 This way I can handle a lot of logical workflows in our orgs with only metadata
 configuration. It called **Metadata Driven Data Workflow**.
-Next, we should get some monitoring tools for manage logging that return from
-workflow running. Because it not show us what is a use-case that running data
-workflow.
+**:pushpin: <u>Rules of This Workflow engine</u>**:
-> [!NOTE]
-> _Disclaimer_: I inspire the dynamic statement from the GitHub Action `.yml` files
-> and all of config file from several data orchestration framework tools from my
-> experience on Data Engineer.
-**Rules of This Workflow engine**:
+1. Minimum frequency unit of scheduling is **1 minute** :warning:
+2. Can not re-run only failed stage and its pending downstream :rotating_light:
+3. All parallel tasks inside workflow engine use Multi-Threading
+   (Because Python 3.13 unlock GIL :unlock:)
-1. Minimum unit of scheduling is 1 minute
-2. Cannot re-run only failed stage and its pending downstream
-3. All parallel tasks inside workflow engine use Threading
-   (Because Python 3.13 unlock GIL)
+> [!NOTE]
+> _Disclaimer_: I inspire the dynamic statement from the [**GitHub Action**](https://github.com/features/actions)
+> `.yml` files and all of config file from several data orchestration framework
+> tools from my experience on Data Engineer. :grimacing:
+>
+> Other workflow that I interest on them and pick some interested feature to this
+> package:
+>
+> - [Google **Workflows**](https://cloud.google.com/workflows)
+> - [AWS **Step Functions**](https://aws.amazon.com/step-functions/)
-## Installation
+## :round_pushpin: Installation
 This project need `ddeutil-io` extension namespace packages. If you want to install
 this package with application add-ons, you should add `app` in installation;
@@ -49,7 +53,7 @@ this package with application add-ons, you should add `app` in installation;
 > I added this feature to the main milestone.
 >
-> **Docker Images** supported:
+> :egg: **Docker Images** supported:
 >
 > | Docker Image                | Python Version | Support |
 > |-----------------------------|----------------|---------|
@@ -58,7 +62,7 @@ this package with application add-ons, you should add `app` in installation;
 > | ddeutil-workflow:python3.11 | `3.11`         | :x:     |
 > | ddeutil-workflow:python3.12 | `3.12`         | :x:     |
-## Usage
+## :beers: Usage
 This is examples that use workflow file for running common Data Engineering
 use-case.
@@ -70,8 +74,10 @@ use-case.
 > maintenance your data workflows.
 ```yaml
-run_py_local:
-   type: Workflow
+run-py-local:
+   # Validate model that use to parsing exists for template file
+   type: ddeutil.workflow.Workflow
    on:
       # If workflow deploy to schedule, it will running every 5 minutes
       # with Asia/Bangkok timezone.
@@ -80,7 +86,7 @@ run_py_local:
    params:
       # Incoming execution parameters will validate with this type. It allow
       # to set default value or templating.
-      author-run: str
+      source-extract: str
       run-date: datetime
    jobs:
       getting-api-data:
@@ -89,17 +95,56 @@ run_py_local:
               id: retrieve-api
               uses: tasks/get-api-with-oauth-to-s3@requests
               with:
-                 url: https://open-data/
-                 auth: ${API_ACCESS_REFRESH_TOKEN}
-                 aws_s3_path: my-data/open-data/
-                 # This Authentication code should implement with your custom hook function.
-                 # The template allow you to use environment variable.
+                 # Arguments of source data that want to retrieve.
+                 method: post
+                 url: https://finances/open-data/currency-pairs/
+                 body:
+                    resource: ${{ params.source-extract }}
+                    # You can able to use filtering like Jinja template but this
+                    # package does not use it.
+                    filter: ${{ params.run-date | fmt(fmt='%Y%m%d') }}
+                 auth:
+                    type: bearer
+                    keys: ${API_ACCESS_REFRESH_TOKEN}
+                 # Arguments of target data that want to landing.
+                 writing_mode: flatten
+                 aws_s3_path: my-data/open-data/${{ params.source-extract }}
+                 # This Authentication code should implement with your custom hook
+                 # function. The template allow you to use environment variable.
                  aws_access_client_id: ${AWS_ACCESS_CLIENT_ID}
                  aws_access_client_secret: ${AWS_ACCESS_CLIENT_SECRET}
 ```
-## Configuration
+The above workflow template is main executor pipeline that you want to do. If you
+want to schedule this workflow, you want to dynamic its parameters change base on
+execution time such as `run-date` should change base on that workflow running date.
+So, this package provide the `Schedule` template for this action.
+```yaml
+schedule-run-local-wf:
+   # Validate model that use to parsing exists for template file
+   type: ddeutil.workflow.scheduler.Schedule
+   workflows:
+      # Map existing workflow that want to deploy with scheduler application.
+      # It allow you to passing release parameter that dynamic change depend the
+      # current context of this scheduler application releasing that time.
+      - name: run-py-local
+        params:
+          source-extract: "USD-THB"
+          asat-dt: "${{ release.logical_date }}"
+```
+## :cookie: Configuration
+The main configuration that use to dynamic changing with your propose of this
+application. If any configuration values do not set yet, it will use default value
+and do not raise any error to you.
 | Environment                         | Component | Default                          | Description                                                                |
 |-------------------------------------|-----------|----------------------------------|----------------------------------------------------------------------------|
@@ -125,7 +170,7 @@ run_py_local:
 | `WORKFLOW_API_ENABLE_ROUTE_WORKFLOW` | API       | true    | A flag that enable workflow route to manage execute manually and workflow logging |
 | `WORKFLOW_API_ENABLE_ROUTE_SCHEDULE` | API       | true    | A flag that enable run scheduler                                                  |
-## Deployment
+## :rocket: Deployment
 This package able to run as a application service for receive manual trigger
 from the master node via RestAPI or use to be Scheduler background service

{ddeutil_workflow-0.0.12 → ddeutil_workflow-0.0.13}/pyproject.toml RENAMED Viewed

@@ -22,10 +22,11 @@ classifiers = [
     "Programming Language :: Python :: 3.10",
     "Programming Language :: Python :: 3.11",
     "Programming Language :: Python :: 3.12",
+    "Programming Language :: Python :: 3.13",
 ]
 requires-python = ">=3.9.13"
 dependencies = [
-    "ddeutil-io",
+    "ddeutil-io>=0.1.12",
     "python-dotenv==1.0.1",
     "typer==0.12.5,<1.0.0",
     "schedule==1.2.2,<2.0.0",
@@ -34,7 +35,7 @@ dynamic = ["version"]
 [project.optional-dependencies]
 api = [
-    "fastapi==0.112.2,<1.0.0",
+    "fastapi>=0.114.1,<1.0.0",
 ]
 [project.urls]
@@ -60,8 +61,11 @@ relative_files = true
 concurrency = ["thread", "multiprocessing"]
 source = ["ddeutil.workflow", "tests"]
 omit = [
+    "src/ddeutil/workflow/__about__.py",
     "scripts/",
     # Omit this files because it does not ready to production.
+    "src/ddeutil/workflow/api.py",
+    "src/ddeutil/workflow/cli.py",
     "src/ddeutil/workflow/repeat.py",
     "src/ddeutil/workflow/route.py",
     "tests/utils.py",

ddeutil_workflow-0.0.13/src/ddeutil/workflow/__about__.py ADDED Viewed

	@@ -0,0 +1 @@
1	+ __version__: str = "0.0.13"

{ddeutil_workflow-0.0.12 → ddeutil_workflow-0.0.13}/src/ddeutil/workflow/job.py RENAMED Viewed

@@ -14,11 +14,13 @@ from concurrent.futures import (
     as_completed,
     wait,
 )
+from functools import lru_cache
 from pickle import PickleError
 from textwrap import dedent
 from threading import Event
 from typing import Optional
+from ddeutil.core import freeze_args
 from pydantic import BaseModel, Field
 from pydantic.functional_validators import field_validator, model_validator
 from typing_extensions import Self
@@ -53,12 +55,70 @@ logger = get_logger("ddeutil.workflow")
 __all__: TupleStr = (
     "Strategy",
     "Job",
+    "make",
 )
+@freeze_args
+@lru_cache
+def make(matrix, include, exclude) -> list[DictStr]:
+    """Return List of product of matrix values that already filter with
+    exclude and add include.
+    :param matrix: A matrix values that want to cross product to possible
+        parallelism values.
+    :param include: A list of additional matrix that want to adds-in.
+    :param exclude: A list of exclude matrix that want to filter-out.
+    :rtype: list[DictStr]
+    """
+    # NOTE: If it does not set matrix, it will return list of an empty dict.
+    if not (mt := matrix):
+        return [{}]
+    final: list[DictStr] = []
+    for r in cross_product(matrix=mt):
+        if any(
+            all(r[k] == v for k, v in exclude.items()) for exclude in exclude
+        ):
+            continue
+        final.append(r)
+    # NOTE: If it is empty matrix and include, it will return list of an
+    #   empty dict.
+    if not final and not include:
+        return [{}]
+    # NOTE: Add include to generated matrix with exclude list.
+    add: list[DictStr] = []
+    for inc in include:
+        # VALIDATE:
+        #   Validate any key in include list should be a subset of some one
+        #   in matrix.
+        if all(not (set(inc.keys()) <= set(m.keys())) for m in final):
+            raise ValueError("Include should have the keys equal to matrix")
+        # VALIDATE:
+        #   Validate value of include does not duplicate with generated
+        #   matrix.
+        if any(
+            all(inc.get(k) == v for k, v in m.items()) for m in [*final, *add]
+        ):
+            continue
+        add.append(inc)
+    final.extend(add)
+    return final
 class Strategy(BaseModel):
     """Strategy Model that will combine a matrix together for running the
-    special job.
+    special job with combination of matrix data.
+        This model does not be the part of job only because you can use it to
+    any model object. The propose of this model is generate metrix result that
+    comming from combination logic with any matrix values for running it with
+    parallelism.
+        [1, 2, 3] x [a, b] --> [1a], [1b], [2a], [2b], [3a], [3b]
     Data Validate:
         >>> strategy = {
@@ -105,13 +165,19 @@ class Strategy(BaseModel):
     def __prepare_keys(cls, values: DictData) -> DictData:
         """Rename key that use dash to underscore because Python does not
         support this character exist in any variable name.
+        :param values: A parsing values to this models
+        :rtype: DictData
         """
         dash2underscore("max-parallel", values)
         dash2underscore("fail-fast", values)
         return values
     def is_set(self) -> bool:
-        """Return True if this strategy was set from yaml template."""
+        """Return True if this strategy was set from yaml template.
+        :rtype: bool
+        """
         return len(self.matrix) > 0
     def make(self) -> list[DictStr]:
@@ -120,44 +186,7 @@ class Strategy(BaseModel):
         :rtype: list[DictStr]
         """
-        # NOTE: If it does not set matrix, it will return list of an empty dict.
-        if not (mt := self.matrix):
-            return [{}]
-        final: list[DictStr] = []
-        for r in cross_product(matrix=mt):
-            if any(
-                all(r[k] == v for k, v in exclude.items())
-                for exclude in self.exclude
-            ):
-                continue
-            final.append(r)
-        # NOTE: If it is empty matrix and include, it will return list of an
-        #   empty dict.
-        if not final and not self.include:
-            return [{}]
-        # NOTE: Add include to generated matrix with exclude list.
-        add: list[DictStr] = []
-        for include in self.include:
-            # VALIDATE:
-            #   Validate any key in include list should be a subset of some one
-            #   in matrix.
-            if all(not (set(include.keys()) <= set(m.keys())) for m in final):
-                raise ValueError("Include should have the keys equal to matrix")
-            # VALIDATE:
-            #   Validate value of include does not duplicate with generated
-            #   matrix.
-            if any(
-                all(include.get(k) == v for k, v in m.items())
-                for m in [*final, *add]
-            ):
-                continue
-            add.append(include)
-        final.extend(add)
-        return final
+        return make(self.matrix, self.include, self.exclude)
 class Job(BaseModel):
@@ -238,6 +267,7 @@ class Job(BaseModel):
     @model_validator(mode="after")
     def __prepare_running_id(self):
+        """Prepare the job running ID."""
         if self.run_id is None:
             self.run_id = gen_id(self.id or "", unique=True)
@@ -487,7 +517,7 @@ class Job(BaseModel):
         stop all not done futures if it receive the first exception from all
         running futures.
-        :param event:
+        :param event: An event
         :param futures: A list of futures.
         :rtype: Result
         """
@@ -529,7 +559,8 @@ class Job(BaseModel):
     def __catch_all_completed(self, futures: list[Future]) -> Result:
         """Job parallel pool futures catching with all-completed mode.
-        :param futures: A list of futures.
+        :param futures: A list of futures that want to catch all completed
+            result.
         :rtype: Result
         """
         context: DictData = {}

{ddeutil_workflow-0.0.12 → ddeutil_workflow-0.0.13}/src/ddeutil/workflow/on.py RENAMED Viewed

@@ -20,6 +20,7 @@ from .utils import Loader
 __all__: TupleStr = (
     "On",
+    "YearOn",
     "interval2crontab",
 )
@@ -187,8 +188,10 @@ class On(BaseModel):
         return self.cronjob.schedule(date=start, tz=self.tz).next
-class AwsOn(On):
-    """Implement On AWS Schedule for AWS Service like AWS Glue."""
+class YearOn(On):
+    """Implement On Year Schedule Model for limit year matrix that use by some
+    data schedule tools like AWS Glue.
+    """
     model_config = ConfigDict(arbitrary_types_allowed=True)

ddeutil-workflow 0.0.12__tar.gz → 0.0.13__tar.gz

ddeutil-workflow 0.0.12tar.gz → 0.0.13tar.gz