PyPI - ddeutil-workflow - Versions diffs - 0.0.4__tar.gz → 0.0.6__tar.gz - Mend

ddeutil-workflow 0.0.4tar.gz → 0.0.6tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (63) hide show

{ddeutil_workflow-0.0.4/src/ddeutil_workflow.egg-info → ddeutil_workflow-0.0.6}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.1
 Name: ddeutil-workflow
-Version: 0.0.4
+Version: 0.0.6
 Summary: Data Developer & Engineer Workflow Utility Objects
 Author-email: ddeutils <korawich.anu@gmail.com>
 License: MIT
@@ -9,7 +9,7 @@ Project-URL: Source Code, https://github.com/ddeutils/ddeutil-workflow/
 Keywords: data,workflow,utility,pipeline
 Classifier: Topic :: Utilities
 Classifier: Natural Language :: English
-Classifier: Development Status :: 3 - Alpha
+Classifier: Development Status :: 4 - Beta
 Classifier: Intended Audience :: Developers
 Classifier: Operating System :: OS Independent
 Classifier: Programming Language :: Python
@@ -23,35 +23,33 @@ Description-Content-Type: text/markdown
 License-File: LICENSE
 Requires-Dist: fmtutil
 Requires-Dist: ddeutil-io
-Requires-Dist: python-dotenv
-Provides-Extra: test
-Requires-Dist: sqlalchemy==2.0.30; extra == "test"
-Requires-Dist: paramiko==3.4.0; extra == "test"
-Requires-Dist: sshtunnel==0.4.0; extra == "test"
-Requires-Dist: boto3==1.34.117; extra == "test"
-Requires-Dist: fsspec==2024.5.0; extra == "test"
-Requires-Dist: polars==0.20.31; extra == "test"
-Requires-Dist: pyarrow==16.1.0; extra == "test"
-# Data Utility: _Workflow_
+Requires-Dist: python-dotenv==1.0.1
+Provides-Extra: app
+Requires-Dist: fastapi==0.112.0; extra == "app"
+Requires-Dist: apscheduler[sqlalchemy]==3.10.4; extra == "app"
+# Workflow
 [![test](https://github.com/ddeutils/ddeutil-workflow/actions/workflows/tests.yml/badge.svg?branch=main)](https://github.com/ddeutils/ddeutil-workflow/actions/workflows/tests.yml)
 [![python support version](https://img.shields.io/pypi/pyversions/ddeutil-workflow)](https://pypi.org/project/ddeutil-workflow/)
 [![size](https://img.shields.io/github/languages/code-size/ddeutils/ddeutil-workflow)](https://github.com/ddeutils/ddeutil-workflow)
+[![gh license](https://img.shields.io/github/license/ddeutils/ddeutil-workflow)](https://github.com/ddeutils/ddeutil-workflow/blob/main/LICENSE)
 **Table of Contents**:
 - [Installation](#installation)
 - [Getting Started](#getting-started)
-  - [Connection](#connection)
-  - [Dataset](#dataset)
-  - [Schedule](#schedule)
-- [Examples](#examples)
-  - [Python](#python)
-  - [Tasks (EL)](#tasks-extract--load)
-  - [Hooks (T)](#hooks-transform)
-This **Utility Workflow** objects was created for easy to make a simple metadata
+- [Core Features](#core-features)
+  - [On](#on)
+  - [Pipeline](#pipeline)
+- [Usage](#usage)
+  - [Python & Bash](#python--bash)
+  - [Hook (EL)](#hook-extract--load)
+  - [Hook (T)](#hook-transform)
+- [Configuration](#configuration)
+- [Deployment](#deployment)
+This **Workflow** objects was created for easy to make a simple metadata
 driven pipeline that able to **ETL, T, EL, or ELT** by `.yaml` file.
 I think we should not create the multiple pipeline per use-case if we able to
@@ -74,13 +72,18 @@ pipeline.
 pip install ddeutil-workflow
 ```
-This project need `ddeutil-io`, `ddeutil-model` extension namespace packages.
+This project need `ddeutil-io` extension namespace packages. If you want to install
+this package with application add-ons, you should add `app` in installation;
+```shell
+pip install ddeutil-workflow[app]
+```
 ## Getting Started
 The first step, you should start create the connections and datasets for In and
 Out of you data that want to use in pipeline of workflow. Some of this component
-is similar component of the **Airflow** because I like it concepts.
+is similar component of the **Airflow** because I like it orchestration concepts.
 The main feature of this project is the `Pipeline` object that can call any
 registries function. The pipeline can handle everything that you want to do, it
@@ -91,88 +94,84 @@ will passing parameters and catching the output for re-use it to next step.
 > dynamic registries instead of main features because it have a lot of maintain
 > vendor codes and deps. (I do not have time to handle this features)
-### Connection
+### On
-The connection for worker able to do any thing.
+The **On** is schedule object.
 ```yaml
-conn_postgres_data:
-  type: conn.Postgres
-  url: 'postgres//username:${ENV_PASS}@hostname:port/database?echo=True&time_out=10'
+on_every_5_min:
+  type: on.On
+  cron: "*/5 * * * *"
 ```
 ```python
-from ddeutil.workflow.conn import Conn
+from ddeutil.workflow.on import On
-conn = Conn.from_loader(name='conn_postgres_data', externals={})
-assert conn.ping()
-```
+schedule = On.from_loader(name='on_every_5_min', externals={})
+assert '*/5 * * * *' == str(schedule.cronjob)
-### Dataset
-The dataset is define any objects on the connection. This feature was implemented
-on `/vendors` because it has a lot of tools that can interact with any data systems
-in the data tool stacks.
-```yaml
-ds_postgres_customer_tbl:
-  type: dataset.PostgresTbl
-  conn: 'conn_postgres_data'
-  features:
-    id: serial primary key
-    name: varchar( 100 ) not null
+cron_iter = schedule.generate('2022-01-01 00:00:00')
+assert '2022-01-01 00:05:00' f"{cron_iter.next:%Y-%m-%d %H:%M:%S}"
+assert '2022-01-01 00:10:00' f"{cron_iter.next:%Y-%m-%d %H:%M:%S}"
+assert '2022-01-01 00:15:00' f"{cron_iter.next:%Y-%m-%d %H:%M:%S}"
+assert '2022-01-01 00:20:00' f"{cron_iter.next:%Y-%m-%d %H:%M:%S}"
 ```
-```python
-from ddeutil.workflow.vendors.pg import PostgresTbl
-dataset = PostgresTbl.from_loader(name='ds_postgres_customer_tbl', externals={})
-assert dataset.exists()
-```
+### Pipeline
-### Schedule
+The **Pipeline** object that is the core feature of this project.
 ```yaml
-schd_for_node:
-  type: schedule.Schedule
-  cron: "*/5 * * * *"
+run_py_local:
+  type: ddeutil.workflow.pipeline.Pipeline
+  on: 'on_every_5_min'
+  params:
+    author-run:
+      type: str
+    run-date:
+      type: datetime
 ```
 ```python
-from ddeutil.workflow.schedule import Schedule
-scdl = Schedule.from_loader(name='schd_for_node', externals={})
-assert '*/5 * * * *' == str(scdl.cronjob)
+from ddeutil.workflow.pipeline import Pipeline
-cron_iterate = scdl.generate('2022-01-01 00:00:00')
-assert '2022-01-01 00:05:00' f"{cron_iterate.next:%Y-%m-%d %H:%M:%S}"
-assert '2022-01-01 00:10:00' f"{cron_iterate.next:%Y-%m-%d %H:%M:%S}"
-assert '2022-01-01 00:15:00' f"{cron_iterate.next:%Y-%m-%d %H:%M:%S}"
-assert '2022-01-01 00:20:00' f"{cron_iterate.next:%Y-%m-%d %H:%M:%S}"
-assert '2022-01-01 00:25:00' f"{cron_iterate.next:%Y-%m-%d %H:%M:%S}"
+pipe = Pipeline.from_loader(name='run_py_local', externals={})
+pipe.execute(params={'author-run': 'Local Workflow', 'run-date': '2024-01-01'})
 ```
-## Examples
+> [!NOTE]
+> The above parameter use short declarative statement. You can pass a parameter
+> type to the key of a parameter name.
+> ```yaml
+> params:
+>   author-run: str
+>   run-date: datetime
+> ```
+>
+> And for the type, you can remove `ddeutil.workflow` prefix because we can find
+> it by looping search from `WORKFLOW_CORE_REGISTRY` value.
+## Usage
 This is examples that use workflow file for running common Data Engineering
 use-case.
-### Python
+> [!IMPORTANT]
+> I recommend you to use `task` stage for all actions that you want to do with
+> pipeline object.
-The state of doing lists that worker should to do. It be collection of the stage.
+### Python & Bash
 ```yaml
 run_py_local:
-  type: ddeutil.workflow.pipe.Pipeline
+  type: pipeline.Pipeline
   params:
-    author-run:
-      type: str
-    run-date:
-      type: datetime
+    author-run: str
+    run-date: datetime
   jobs:
     first-job:
       stages:
-        - name: Printing Information
+        - name: "Printing Information"
           id: define-func
           run: |
             x = '${{ params.author-run }}'
@@ -181,7 +180,7 @@ run_py_local:
             def echo(name: str):
               print(f'Hello {name}')
-        - name: Run Sequence and use var from Above
+        - name: "Run Sequence and use var from Above"
           vars:
             x: ${{ params.author-run }}
           run: |
@@ -189,11 +188,17 @@ run_py_local:
             # Change x value
             x: int = 1
-        - name: Call Function
+        - name: "Call Function"
           vars:
             echo: ${{ stages.define-func.outputs.echo }}
           run: |
             echo('Caller')
+    second-job:
+      stages:
+        - name: "Echo Bash Script"
+          id: shell-echo
+          bash: |
+            echo "Hello World from Shell"
 ```
 ```python
@@ -207,24 +212,23 @@ pipe.execute(params={'author-run': 'Local Workflow', 'run-date': '2024-01-01'})
 > Hello Local Workflow
 > Receive x from above with Local Workflow
 > Hello Caller
+> Hello World from Shell
 ```
-### Tasks (Extract & Load)
+### Hook (Extract & Load)
 ```yaml
 pipe_el_pg_to_lake:
-  type: ddeutil.workflow.pipe.Pipeline
+  type: pipeline.Pipeline
   params:
-    run-date:
-      type: datetime
-    author-email:
-      type: str
+    run-date: datetime
+    author-email: str
   jobs:
     extract-load:
       stages:
         - name: "Extract Load from Postgres to Lake"
           id: extract-load
-          task: tasks/postgres-to-delta@polars
+          uses: tasks/postgres-to-delta@polars
           with:
             source:
               conn: conn_postgres_url
@@ -236,11 +240,11 @@ pipe_el_pg_to_lake:
               endpoint: "/${{ params.name }}"
 ```
-### Tasks (Transform)
+### Hook (Transform)
 ```yaml
-pipe_hook_mssql_proc:
-  type: ddeutil.workflow.pipe.Pipeline
+pipeline_hook_mssql_proc:
+  type: pipeline.Pipeline
   params:
     run_date: datetime
     sp_name: str
@@ -251,7 +255,7 @@ pipe_hook_mssql_proc:
       stages:
         - name: "Transform Data in MS SQL Server"
           id: transform
-          task: tasks/mssql-proc@odbc
+          uses: tasks/mssql-proc@odbc
           with:
             exec: ${{ params.sp_name }}
             params:
@@ -261,6 +265,30 @@ pipe_hook_mssql_proc:
               target: ${{ params.target_name }}
 ```
-## License
+## Configuration
-This project was licensed under the terms of the [MIT license](LICENSE).
+```bash
+export WORKFLOW_ROOT_PATH=.
+export WORKFLOW_CORE_REGISTRY=ddeutil.workflow,tests.utils
+export WORKFLOW_CORE_PATH_CONF=conf
+```
+Application config:
+```bash
+export WORKFLOW_APP_DB_URL=postgresql+asyncpg://user:pass@localhost:5432/schedule
+export WORKFLOW_APP_INTERVAL=10
+```
+## Deployment
+This package able to run as a application service for receive manual trigger
+from the master node via RestAPI.
+> [!WARNING]
+> This feature do not start yet because I still research and find the best tool
+> to use it provision an app service, like `starlette`, `fastapi`, `apscheduler`.
+```shell
+(venv) $ workflow start -p 7070
+```

{ddeutil_workflow-0.0.4 → ddeutil_workflow-0.0.6}/README.md RENAMED Viewed

@@ -1,22 +1,25 @@
-# Data Utility: _Workflow_
+# Workflow
 [![test](https://github.com/ddeutils/ddeutil-workflow/actions/workflows/tests.yml/badge.svg?branch=main)](https://github.com/ddeutils/ddeutil-workflow/actions/workflows/tests.yml)
 [![python support version](https://img.shields.io/pypi/pyversions/ddeutil-workflow)](https://pypi.org/project/ddeutil-workflow/)
 [![size](https://img.shields.io/github/languages/code-size/ddeutils/ddeutil-workflow)](https://github.com/ddeutils/ddeutil-workflow)
+[![gh license](https://img.shields.io/github/license/ddeutils/ddeutil-workflow)](https://github.com/ddeutils/ddeutil-workflow/blob/main/LICENSE)
 **Table of Contents**:
 - [Installation](#installation)
 - [Getting Started](#getting-started)
-  - [Connection](#connection)
-  - [Dataset](#dataset)
-  - [Schedule](#schedule)
-- [Examples](#examples)
-  - [Python](#python)
-  - [Tasks (EL)](#tasks-extract--load)
-  - [Hooks (T)](#hooks-transform)
-This **Utility Workflow** objects was created for easy to make a simple metadata
+- [Core Features](#core-features)
+  - [On](#on)
+  - [Pipeline](#pipeline)
+- [Usage](#usage)
+  - [Python & Bash](#python--bash)
+  - [Hook (EL)](#hook-extract--load)
+  - [Hook (T)](#hook-transform)
+- [Configuration](#configuration)
+- [Deployment](#deployment)
+This **Workflow** objects was created for easy to make a simple metadata
 driven pipeline that able to **ETL, T, EL, or ELT** by `.yaml` file.
 I think we should not create the multiple pipeline per use-case if we able to
@@ -39,13 +42,18 @@ pipeline.
 pip install ddeutil-workflow
 ```
-This project need `ddeutil-io`, `ddeutil-model` extension namespace packages.
+This project need `ddeutil-io` extension namespace packages. If you want to install
+this package with application add-ons, you should add `app` in installation;
+```shell
+pip install ddeutil-workflow[app]
+```
 ## Getting Started
 The first step, you should start create the connections and datasets for In and
 Out of you data that want to use in pipeline of workflow. Some of this component
-is similar component of the **Airflow** because I like it concepts.
+is similar component of the **Airflow** because I like it orchestration concepts.
 The main feature of this project is the `Pipeline` object that can call any
 registries function. The pipeline can handle everything that you want to do, it
@@ -56,88 +64,84 @@ will passing parameters and catching the output for re-use it to next step.
 > dynamic registries instead of main features because it have a lot of maintain
 > vendor codes and deps. (I do not have time to handle this features)
-### Connection
+### On
-The connection for worker able to do any thing.
+The **On** is schedule object.
 ```yaml
-conn_postgres_data:
-  type: conn.Postgres
-  url: 'postgres//username:${ENV_PASS}@hostname:port/database?echo=True&time_out=10'
+on_every_5_min:
+  type: on.On
+  cron: "*/5 * * * *"
 ```
 ```python
-from ddeutil.workflow.conn import Conn
-conn = Conn.from_loader(name='conn_postgres_data', externals={})
-assert conn.ping()
-```
+from ddeutil.workflow.on import On
-### Dataset
+schedule = On.from_loader(name='on_every_5_min', externals={})
+assert '*/5 * * * *' == str(schedule.cronjob)
-The dataset is define any objects on the connection. This feature was implemented
-on `/vendors` because it has a lot of tools that can interact with any data systems
-in the data tool stacks.
-```yaml
-ds_postgres_customer_tbl:
-  type: dataset.PostgresTbl
-  conn: 'conn_postgres_data'
-  features:
-    id: serial primary key
-    name: varchar( 100 ) not null
+cron_iter = schedule.generate('2022-01-01 00:00:00')
+assert '2022-01-01 00:05:00' f"{cron_iter.next:%Y-%m-%d %H:%M:%S}"
+assert '2022-01-01 00:10:00' f"{cron_iter.next:%Y-%m-%d %H:%M:%S}"
+assert '2022-01-01 00:15:00' f"{cron_iter.next:%Y-%m-%d %H:%M:%S}"
+assert '2022-01-01 00:20:00' f"{cron_iter.next:%Y-%m-%d %H:%M:%S}"
 ```
-```python
-from ddeutil.workflow.vendors.pg import PostgresTbl
-dataset = PostgresTbl.from_loader(name='ds_postgres_customer_tbl', externals={})
-assert dataset.exists()
-```
+### Pipeline
-### Schedule
+The **Pipeline** object that is the core feature of this project.
 ```yaml
-schd_for_node:
-  type: schedule.Schedule
-  cron: "*/5 * * * *"
+run_py_local:
+  type: ddeutil.workflow.pipeline.Pipeline
+  on: 'on_every_5_min'
+  params:
+    author-run:
+      type: str
+    run-date:
+      type: datetime
 ```
 ```python
-from ddeutil.workflow.schedule import Schedule
-scdl = Schedule.from_loader(name='schd_for_node', externals={})
-assert '*/5 * * * *' == str(scdl.cronjob)
+from ddeutil.workflow.pipeline import Pipeline
-cron_iterate = scdl.generate('2022-01-01 00:00:00')
-assert '2022-01-01 00:05:00' f"{cron_iterate.next:%Y-%m-%d %H:%M:%S}"
-assert '2022-01-01 00:10:00' f"{cron_iterate.next:%Y-%m-%d %H:%M:%S}"
-assert '2022-01-01 00:15:00' f"{cron_iterate.next:%Y-%m-%d %H:%M:%S}"
-assert '2022-01-01 00:20:00' f"{cron_iterate.next:%Y-%m-%d %H:%M:%S}"
-assert '2022-01-01 00:25:00' f"{cron_iterate.next:%Y-%m-%d %H:%M:%S}"
+pipe = Pipeline.from_loader(name='run_py_local', externals={})
+pipe.execute(params={'author-run': 'Local Workflow', 'run-date': '2024-01-01'})
 ```
-## Examples
+> [!NOTE]
+> The above parameter use short declarative statement. You can pass a parameter
+> type to the key of a parameter name.
+> ```yaml
+> params:
+>   author-run: str
+>   run-date: datetime
+> ```
+>
+> And for the type, you can remove `ddeutil.workflow` prefix because we can find
+> it by looping search from `WORKFLOW_CORE_REGISTRY` value.
+## Usage
 This is examples that use workflow file for running common Data Engineering
 use-case.
-### Python
+> [!IMPORTANT]
+> I recommend you to use `task` stage for all actions that you want to do with
+> pipeline object.
-The state of doing lists that worker should to do. It be collection of the stage.
+### Python & Bash
 ```yaml
 run_py_local:
-  type: ddeutil.workflow.pipe.Pipeline
+  type: pipeline.Pipeline
   params:
-    author-run:
-      type: str
-    run-date:
-      type: datetime
+    author-run: str
+    run-date: datetime
   jobs:
     first-job:
       stages:
-        - name: Printing Information
+        - name: "Printing Information"
           id: define-func
           run: |
             x = '${{ params.author-run }}'
@@ -146,7 +150,7 @@ run_py_local:
             def echo(name: str):
               print(f'Hello {name}')
-        - name: Run Sequence and use var from Above
+        - name: "Run Sequence and use var from Above"
           vars:
             x: ${{ params.author-run }}
           run: |
@@ -154,11 +158,17 @@ run_py_local:
             # Change x value
             x: int = 1
-        - name: Call Function
+        - name: "Call Function"
           vars:
             echo: ${{ stages.define-func.outputs.echo }}
           run: |
             echo('Caller')
+    second-job:
+      stages:
+        - name: "Echo Bash Script"
+          id: shell-echo
+          bash: |
+            echo "Hello World from Shell"
 ```
 ```python
@@ -172,24 +182,23 @@ pipe.execute(params={'author-run': 'Local Workflow', 'run-date': '2024-01-01'})
 > Hello Local Workflow
 > Receive x from above with Local Workflow
 > Hello Caller
+> Hello World from Shell
 ```
-### Tasks (Extract & Load)
+### Hook (Extract & Load)
 ```yaml
 pipe_el_pg_to_lake:
-  type: ddeutil.workflow.pipe.Pipeline
+  type: pipeline.Pipeline
   params:
-    run-date:
-      type: datetime
-    author-email:
-      type: str
+    run-date: datetime
+    author-email: str
   jobs:
     extract-load:
       stages:
         - name: "Extract Load from Postgres to Lake"
           id: extract-load
-          task: tasks/postgres-to-delta@polars
+          uses: tasks/postgres-to-delta@polars
           with:
             source:
               conn: conn_postgres_url
@@ -201,11 +210,11 @@ pipe_el_pg_to_lake:
               endpoint: "/${{ params.name }}"
 ```
-### Tasks (Transform)
+### Hook (Transform)
 ```yaml
-pipe_hook_mssql_proc:
-  type: ddeutil.workflow.pipe.Pipeline
+pipeline_hook_mssql_proc:
+  type: pipeline.Pipeline
   params:
     run_date: datetime
     sp_name: str
@@ -216,7 +225,7 @@ pipe_hook_mssql_proc:
       stages:
         - name: "Transform Data in MS SQL Server"
           id: transform
-          task: tasks/mssql-proc@odbc
+          uses: tasks/mssql-proc@odbc
           with:
             exec: ${{ params.sp_name }}
             params:
@@ -226,6 +235,30 @@ pipe_hook_mssql_proc:
               target: ${{ params.target_name }}
 ```
-## License
+## Configuration
-This project was licensed under the terms of the [MIT license](LICENSE).
+```bash
+export WORKFLOW_ROOT_PATH=.
+export WORKFLOW_CORE_REGISTRY=ddeutil.workflow,tests.utils
+export WORKFLOW_CORE_PATH_CONF=conf
+```
+Application config:
+```bash
+export WORKFLOW_APP_DB_URL=postgresql+asyncpg://user:pass@localhost:5432/schedule
+export WORKFLOW_APP_INTERVAL=10
+```
+## Deployment
+This package able to run as a application service for receive manual trigger
+from the master node via RestAPI.
+> [!WARNING]
+> This feature do not start yet because I still research and find the best tool
+> to use it provision an app service, like `starlette`, `fastapi`, `apscheduler`.
+```shell
+(venv) $ workflow start -p 7070
+```

ddeutil-workflow 0.0.4__tar.gz → 0.0.6__tar.gz

ddeutil-workflow 0.0.4tar.gz → 0.0.6tar.gz