ddeutil-workflow 0.0.8__py3-none-any.whl → 0.0.9__py3-none-any.whl
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- ddeutil/workflow/__about__.py +1 -1
- ddeutil/workflow/__init__.py +3 -14
- ddeutil/workflow/api.py +44 -75
- ddeutil/workflow/cli.py +51 -0
- ddeutil/workflow/cron.py +713 -0
- ddeutil/workflow/loader.py +65 -13
- ddeutil/workflow/log.py +147 -49
- ddeutil/workflow/on.py +18 -15
- ddeutil/workflow/pipeline.py +389 -140
- ddeutil/workflow/repeat.py +9 -5
- ddeutil/workflow/route.py +30 -37
- ddeutil/workflow/scheduler.py +398 -659
- ddeutil/workflow/stage.py +145 -73
- ddeutil/workflow/utils.py +133 -42
- ddeutil_workflow-0.0.9.dist-info/METADATA +273 -0
- ddeutil_workflow-0.0.9.dist-info/RECORD +22 -0
- {ddeutil_workflow-0.0.8.dist-info → ddeutil_workflow-0.0.9.dist-info}/WHEEL +1 -1
- ddeutil_workflow-0.0.9.dist-info/entry_points.txt +2 -0
- ddeutil/workflow/app.py +0 -45
- ddeutil_workflow-0.0.8.dist-info/METADATA +0 -266
- ddeutil_workflow-0.0.8.dist-info/RECORD +0 -20
- {ddeutil_workflow-0.0.8.dist-info → ddeutil_workflow-0.0.9.dist-info}/LICENSE +0 -0
- {ddeutil_workflow-0.0.8.dist-info → ddeutil_workflow-0.0.9.dist-info}/top_level.txt +0 -0
@@ -1,266 +0,0 @@
|
|
1
|
-
Metadata-Version: 2.1
|
2
|
-
Name: ddeutil-workflow
|
3
|
-
Version: 0.0.8
|
4
|
-
Summary: Data Developer & Engineer Workflow Utility Objects
|
5
|
-
Author-email: ddeutils <korawich.anu@gmail.com>
|
6
|
-
License: MIT
|
7
|
-
Project-URL: Homepage, https://github.com/ddeutils/ddeutil-workflow/
|
8
|
-
Project-URL: Source Code, https://github.com/ddeutils/ddeutil-workflow/
|
9
|
-
Keywords: data,workflow,utility,pipeline
|
10
|
-
Classifier: Topic :: Utilities
|
11
|
-
Classifier: Natural Language :: English
|
12
|
-
Classifier: Development Status :: 4 - Beta
|
13
|
-
Classifier: Intended Audience :: Developers
|
14
|
-
Classifier: Operating System :: OS Independent
|
15
|
-
Classifier: Programming Language :: Python
|
16
|
-
Classifier: Programming Language :: Python :: 3 :: Only
|
17
|
-
Classifier: Programming Language :: Python :: 3.9
|
18
|
-
Classifier: Programming Language :: Python :: 3.10
|
19
|
-
Classifier: Programming Language :: Python :: 3.11
|
20
|
-
Classifier: Programming Language :: Python :: 3.12
|
21
|
-
Requires-Python: >=3.9.13
|
22
|
-
Description-Content-Type: text/markdown
|
23
|
-
License-File: LICENSE
|
24
|
-
Requires-Dist: fmtutil
|
25
|
-
Requires-Dist: ddeutil-io
|
26
|
-
Requires-Dist: python-dotenv ==1.0.1
|
27
|
-
Provides-Extra: api
|
28
|
-
Requires-Dist: fastapi[standard] ==0.112.0 ; extra == 'api'
|
29
|
-
Requires-Dist: apscheduler[sqlalchemy] <4.0.0,==3.10.4 ; extra == 'api'
|
30
|
-
Requires-Dist: croniter ==3.0.3 ; extra == 'api'
|
31
|
-
Provides-Extra: app
|
32
|
-
Requires-Dist: schedule <2.0.0,==1.2.2 ; extra == 'app'
|
33
|
-
|
34
|
-
# Workflow
|
35
|
-
|
36
|
-
[](https://github.com/ddeutils/ddeutil-workflow/actions/workflows/tests.yml)
|
37
|
-
[](https://pypi.org/project/ddeutil-workflow/)
|
38
|
-
[](https://github.com/ddeutils/ddeutil-workflow)
|
39
|
-
[](https://github.com/ddeutils/ddeutil-workflow/blob/main/LICENSE)
|
40
|
-
|
41
|
-
**Table of Contents**:
|
42
|
-
|
43
|
-
- [Installation](#installation)
|
44
|
-
- [Getting Started](#getting-started)
|
45
|
-
- [On](#on)
|
46
|
-
- [Pipeline](#pipeline)
|
47
|
-
- [Usage](#usage)
|
48
|
-
- [Python & Bash](#python--bash)
|
49
|
-
- [Hook (EL)](#hook-extract--load)
|
50
|
-
- [Hook (T)](#hook-transform)
|
51
|
-
- [Configuration](#configuration)
|
52
|
-
- [Deployment](#deployment)
|
53
|
-
|
54
|
-
This **Workflow** objects was created for easy to make a simple metadata
|
55
|
-
driven for data pipeline orchestration that able to use for **ETL, T, EL, or
|
56
|
-
ELT** by a `.yaml` file template.
|
57
|
-
|
58
|
-
In my opinion, I think it should not create duplicate pipeline codes if I can
|
59
|
-
write with dynamic input parameters on the one template pipeline that just change
|
60
|
-
the input parameters per use-case instead.
|
61
|
-
This way I can handle a lot of logical pipelines in our orgs with only metadata
|
62
|
-
configuration. It called **Metadata Driven Data Pipeline**.
|
63
|
-
|
64
|
-
Next, we should get some monitoring tools for manage logging that return from
|
65
|
-
pipeline running. Because it not show us what is a use-case that running data
|
66
|
-
pipeline.
|
67
|
-
|
68
|
-
> [!NOTE]
|
69
|
-
> _Disclaimer_: I inspire the dynamic statement from the GitHub Action `.yml` files
|
70
|
-
> and all of config file from several data orchestration framework tools from my
|
71
|
-
> experience on Data Engineer.
|
72
|
-
|
73
|
-
## Installation
|
74
|
-
|
75
|
-
```shell
|
76
|
-
pip install ddeutil-workflow
|
77
|
-
```
|
78
|
-
|
79
|
-
This project need `ddeutil-io` extension namespace packages. If you want to install
|
80
|
-
this package with application add-ons, you should add `app` in installation;
|
81
|
-
|
82
|
-
| Usecase | Install Optional |
|
83
|
-
|--------------------|---------------------------|
|
84
|
-
| Scheduler Service | `ddeutil-workflow[app]` |
|
85
|
-
| FastAPI Server | `ddeutil-workflow[api]` |
|
86
|
-
|
87
|
-
## Getting Started
|
88
|
-
|
89
|
-
The first step, you should start create the connections and datasets for In and
|
90
|
-
Out of you data that want to use in pipeline of workflow. Some of this component
|
91
|
-
is similar component of the **Airflow** because I like it orchestration concepts.
|
92
|
-
|
93
|
-
The main feature of this project is the `Pipeline` object that can call any
|
94
|
-
registries function. The pipeline can handle everything that you want to do, it
|
95
|
-
will passing parameters and catching the output for re-use it to next step.
|
96
|
-
|
97
|
-
> [!IMPORTANT]
|
98
|
-
> In the future of this project, I will drop the connection and dataset to
|
99
|
-
> dynamic registries instead of main features because it have a lot of maintain
|
100
|
-
> vendor codes and deps. (I do not have time to handle this features)
|
101
|
-
|
102
|
-
### On
|
103
|
-
|
104
|
-
The **On** is schedule object.
|
105
|
-
|
106
|
-
```yaml
|
107
|
-
on_every_5_min:
|
108
|
-
type: on.On
|
109
|
-
cron: "*/5 * * * *"
|
110
|
-
```
|
111
|
-
|
112
|
-
```python
|
113
|
-
from ddeutil.workflow.on import On
|
114
|
-
|
115
|
-
schedule = On.from_loader(name='on_every_5_min', externals={})
|
116
|
-
assert '*/5 * * * *' == str(schedule.cronjob)
|
117
|
-
|
118
|
-
cron_iter = schedule.generate('2022-01-01 00:00:00')
|
119
|
-
assert '2022-01-01 00:05:00' f"{cron_iter.next:%Y-%m-%d %H:%M:%S}"
|
120
|
-
assert '2022-01-01 00:10:00' f"{cron_iter.next:%Y-%m-%d %H:%M:%S}"
|
121
|
-
assert '2022-01-01 00:15:00' f"{cron_iter.next:%Y-%m-%d %H:%M:%S}"
|
122
|
-
assert '2022-01-01 00:20:00' f"{cron_iter.next:%Y-%m-%d %H:%M:%S}"
|
123
|
-
```
|
124
|
-
|
125
|
-
### Pipeline
|
126
|
-
|
127
|
-
The **Pipeline** object that is the core feature of this project.
|
128
|
-
|
129
|
-
```yaml
|
130
|
-
run_py_local:
|
131
|
-
type: ddeutil.workflow.pipeline.Pipeline
|
132
|
-
on: 'on_every_5_min'
|
133
|
-
params:
|
134
|
-
author-run:
|
135
|
-
type: str
|
136
|
-
run-date:
|
137
|
-
type: datetime
|
138
|
-
```
|
139
|
-
|
140
|
-
```python
|
141
|
-
from ddeutil.workflow.pipeline import Pipeline
|
142
|
-
|
143
|
-
pipe = Pipeline.from_loader(name='run_py_local', externals={})
|
144
|
-
pipe.execute(params={'author-run': 'Local Workflow', 'run-date': '2024-01-01'})
|
145
|
-
```
|
146
|
-
|
147
|
-
> [!NOTE]
|
148
|
-
> The above parameter use short declarative statement. You can pass a parameter
|
149
|
-
> type to the key of a parameter name.
|
150
|
-
> ```yaml
|
151
|
-
> params:
|
152
|
-
> author-run: str
|
153
|
-
> run-date: datetime
|
154
|
-
> ```
|
155
|
-
>
|
156
|
-
> And for the type, you can remove `ddeutil.workflow` prefix because we can find
|
157
|
-
> it by looping search from `WORKFLOW_CORE_REGISTRY` value.
|
158
|
-
|
159
|
-
## Usage
|
160
|
-
|
161
|
-
This is examples that use workflow file for running common Data Engineering
|
162
|
-
use-case.
|
163
|
-
|
164
|
-
> [!IMPORTANT]
|
165
|
-
> I recommend you to use `task` stage for all actions that you want to do with
|
166
|
-
> pipeline object.
|
167
|
-
|
168
|
-
```yaml
|
169
|
-
run_py_local:
|
170
|
-
type: pipeline.Pipeline
|
171
|
-
on:
|
172
|
-
- cronjob: '* * * * *'
|
173
|
-
timezone: "Asia/Bangkok"
|
174
|
-
params:
|
175
|
-
author-run: str
|
176
|
-
run-date: datetime
|
177
|
-
jobs:
|
178
|
-
first-job:
|
179
|
-
stages:
|
180
|
-
- name: "Printing Information"
|
181
|
-
id: define-func
|
182
|
-
run: |
|
183
|
-
x = '${{ params.run-date | fmt("%Y%m%d") }}'
|
184
|
-
print(f'Hello at {x}')
|
185
|
-
|
186
|
-
def echo(name: str):
|
187
|
-
print(f'Hello {name}')
|
188
|
-
- name: "Run Sequence and use var from Above"
|
189
|
-
vars:
|
190
|
-
x: ${{ params.author-run }}
|
191
|
-
run: |
|
192
|
-
print(f'Receive x from above with {x}')
|
193
|
-
# Change x value
|
194
|
-
x: int = 1
|
195
|
-
- name: "Call Function"
|
196
|
-
vars:
|
197
|
-
echo: ${{ stages.define-func.outputs.echo }}
|
198
|
-
run: |
|
199
|
-
echo('Caller')
|
200
|
-
second-job:
|
201
|
-
stages:
|
202
|
-
- name: "Echo Bash Script"
|
203
|
-
id: shell-echo
|
204
|
-
bash: |
|
205
|
-
echo "Hello World from Shell"
|
206
|
-
```
|
207
|
-
|
208
|
-
```python
|
209
|
-
from datetime import datetime
|
210
|
-
from ddeutil.workflow.pipeline import Pipeline
|
211
|
-
|
212
|
-
pipe: Pipeline = Pipeline.from_loader(name='run_py_local', externals={})
|
213
|
-
pipe.execute(params={
|
214
|
-
'author-run': 'Local Workflow',
|
215
|
-
'run-date': datetime(2024, 1, 1),
|
216
|
-
})
|
217
|
-
```
|
218
|
-
|
219
|
-
```shell
|
220
|
-
> Hello at 20240101
|
221
|
-
> Receive x from above with Local Workflow
|
222
|
-
> Hello Caller
|
223
|
-
> Hello World from Shell
|
224
|
-
```
|
225
|
-
|
226
|
-
## Configuration
|
227
|
-
|
228
|
-
```bash
|
229
|
-
export WORKFLOW_ROOT_PATH=.
|
230
|
-
export WORKFLOW_CORE_REGISTRY=ddeutil.workflow,tests.utils
|
231
|
-
export WORKFLOW_CORE_REGISTRY_FILTER=ddeutil.workflow.utils
|
232
|
-
export WORKFLOW_CORE_PATH_CONF=conf
|
233
|
-
export WORKFLOW_CORE_TIMEZONE=Asia/Bangkok
|
234
|
-
export WORKFLOW_CORE_DEFAULT_STAGE_ID=true
|
235
|
-
export WORKFLOW_CORE_MAX_PIPELINE_POKING=4
|
236
|
-
export WORKFLOW_CORE_MAX_JOB_PARALLEL=2
|
237
|
-
```
|
238
|
-
|
239
|
-
Application config:
|
240
|
-
|
241
|
-
```bash
|
242
|
-
export WORKFLOW_APP_DB_URL=postgresql+asyncpg://user:pass@localhost:5432/schedule
|
243
|
-
export WORKFLOW_APP_INTERVAL=10
|
244
|
-
```
|
245
|
-
|
246
|
-
## Deployment
|
247
|
-
|
248
|
-
This package able to run as a application service for receive manual trigger
|
249
|
-
from the master node via RestAPI or use to be Scheduler background service
|
250
|
-
like crontab job but via Python API.
|
251
|
-
|
252
|
-
### Schedule Service
|
253
|
-
|
254
|
-
```shell
|
255
|
-
(venv) $ python src.ddeutil.workflow.app
|
256
|
-
```
|
257
|
-
|
258
|
-
### API Server
|
259
|
-
|
260
|
-
```shell
|
261
|
-
(venv) $ uvicorn src.ddeutil.workflow.api:app --host 0.0.0.0 --port 80 --reload
|
262
|
-
```
|
263
|
-
|
264
|
-
> [!NOTE]
|
265
|
-
> If this package already deploy, it able to use
|
266
|
-
> `uvicorn ddeutil.workflow.api:app --host 0.0.0.0 --port 80`
|
@@ -1,20 +0,0 @@
|
|
1
|
-
ddeutil/workflow/__about__.py,sha256=FA15NQYpQvn7SrHupxQQQ9Ad5ZzEXOvwDS5UyB1h1bo,27
|
2
|
-
ddeutil/workflow/__init__.py,sha256=4PEL3RdHmUowK0Dz-tK7fO0wvFX4u9CLd0Up7b3lrAQ,760
|
3
|
-
ddeutil/workflow/__types.py,sha256=SYMoxbENQX8uPsiCZkjtpHAqqHOh8rUrarAFicAJd0E,1773
|
4
|
-
ddeutil/workflow/api.py,sha256=d2Mmv9jTtN3FITIy-2mivyAKdBOGZxtkNWRMPbCLlFI,3341
|
5
|
-
ddeutil/workflow/app.py,sha256=BuYhOoSJCHiSoj3xb2I5QoxaHrD3bKdmoJua3bKBetc,1165
|
6
|
-
ddeutil/workflow/exceptions.py,sha256=zuCcsfJ1hFivubXz6lXCpGYXk07d_PkRaUD5ew3_LC0,632
|
7
|
-
ddeutil/workflow/loader.py,sha256=_ZD-XP5P7VbUeqItrUVPaKIZu6dMUZ2aywbCbReW1hQ,2778
|
8
|
-
ddeutil/workflow/log.py,sha256=N2TyjcuAoH0YTvzJCHTO037IHgVkLA986Xhtz1LSgE4,1742
|
9
|
-
ddeutil/workflow/on.py,sha256=YoEqDbzJUwqOA3JRltbvlYr0rNTtxdmb7cWMxl8U19k,6717
|
10
|
-
ddeutil/workflow/pipeline.py,sha256=VC6VDxycUdGKn13V42RZxAlCFySYb2HIZGq_ku5Kp5k,30844
|
11
|
-
ddeutil/workflow/repeat.py,sha256=sNoRfbOR4cYm_edrSvlVy9N8Dk_osLIq9FC5GMZz32M,4621
|
12
|
-
ddeutil/workflow/route.py,sha256=Ck_O1xJwI-vKkMJr37El0-1PGKlwKF8__DDNWVQrf0A,2079
|
13
|
-
ddeutil/workflow/scheduler.py,sha256=FqmkvWCqwJ4eRf8aDn5Ce4FcNWqmcvu2aTTfL34lfgs,22184
|
14
|
-
ddeutil/workflow/stage.py,sha256=tbxENx_-2BQ6peXKM_s6RQ1oGzTlXcZ4yDpP1Hufkdk,18095
|
15
|
-
ddeutil/workflow/utils.py,sha256=seyU81JXfb2zz6QbJvVEb2Wn4qt8f-FBA6QFC97xY5k,21240
|
16
|
-
ddeutil_workflow-0.0.8.dist-info/LICENSE,sha256=nGFZ1QEhhhWeMHf9n99_fdt4vQaXS29xWKxt-OcLywk,1085
|
17
|
-
ddeutil_workflow-0.0.8.dist-info/METADATA,sha256=9i7Jk3CZlBpNkmFFjD247opgYA6Mc8AT6CtZjcvamYI,8314
|
18
|
-
ddeutil_workflow-0.0.8.dist-info/WHEEL,sha256=HiCZjzuy6Dw0hdX5R3LCFPDmFS4BWl8H-8W39XfmgX4,91
|
19
|
-
ddeutil_workflow-0.0.8.dist-info/top_level.txt,sha256=m9M6XeSWDwt_yMsmH6gcOjHZVK5O0-vgtNBuncHjzW4,8
|
20
|
-
ddeutil_workflow-0.0.8.dist-info/RECORD,,
|
File without changes
|
File without changes
|