awehflow 0.0.0__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- awehflow-0.0.0/LICENSE +25 -0
- awehflow-0.0.0/PKG-INFO +374 -0
- awehflow-0.0.0/README.md +354 -0
- awehflow-0.0.0/awehflow/__init__.py +0 -0
- awehflow-0.0.0/awehflow/alerts/__init__.py +0 -0
- awehflow-0.0.0/awehflow/alerts/base.py +7 -0
- awehflow-0.0.0/awehflow/alerts/googlechat.py +113 -0
- awehflow-0.0.0/awehflow/alerts/slack.py +133 -0
- awehflow-0.0.0/awehflow/config.py +59 -0
- awehflow-0.0.0/awehflow/core.py +341 -0
- awehflow-0.0.0/awehflow/events/__init__.py +0 -0
- awehflow-0.0.0/awehflow/events/alerts.py +18 -0
- awehflow-0.0.0/awehflow/events/base.py +17 -0
- awehflow-0.0.0/awehflow/events/gcp.py +32 -0
- awehflow-0.0.0/awehflow/events/postgres.py +145 -0
- awehflow-0.0.0/awehflow/operators/__init__.py +0 -0
- awehflow-0.0.0/awehflow/operators/etl.py +132 -0
- awehflow-0.0.0/awehflow/operators/flow.py +105 -0
- awehflow-0.0.0/awehflow/operators/gcp.py +425 -0
- awehflow-0.0.0/awehflow/operators/gcs.py +401 -0
- awehflow-0.0.0/awehflow/operators/metrics.py +28 -0
- awehflow-0.0.0/awehflow/operators/taxonomy.py +181 -0
- awehflow-0.0.0/awehflow/sensors/__init__.py +0 -0
- awehflow-0.0.0/awehflow/sensors/http.py +99 -0
- awehflow-0.0.0/awehflow/sensors/sql_sensor.py +39 -0
- awehflow-0.0.0/awehflow/utils.py +40 -0
- awehflow-0.0.0/awehflow.egg-info/PKG-INFO +374 -0
- awehflow-0.0.0/awehflow.egg-info/SOURCES.txt +34 -0
- awehflow-0.0.0/awehflow.egg-info/dependency_links.txt +1 -0
- awehflow-0.0.0/awehflow.egg-info/requires.txt +9 -0
- awehflow-0.0.0/awehflow.egg-info/top_level.txt +1 -0
- awehflow-0.0.0/setup.cfg +4 -0
- awehflow-0.0.0/setup.py +46 -0
- awehflow-0.0.0/tests/test_config.py +65 -0
- awehflow-0.0.0/tests/test_core.py +654 -0
- awehflow-0.0.0/tests/test_utils.py +48 -0
awehflow-0.0.0/LICENSE
ADDED
|
@@ -0,0 +1,25 @@
|
|
|
1
|
+
Spatialedge Community License
|
|
2
|
+
Version 1.0
|
|
3
|
+
This Spatialedge Community License Agreement Version 1.0 (the “Agreement”), based on the Confluent Community License Version 1.0 available at http://www.confluent.io/confluent-community-license, sets forth the terms on which Spatialedge (Pty) Ltd (“Spatialedge”) makes available certain software made available by Spatialedge under this Agreement (the “Software”). BY INSTALLING, DOWNLOADING, ACCESSING, USING OR DISTRIBUTING ANY OF THE SOFTWARE, YOU AGREE TO THE TERMS AND CONDITIONS OF THIS AGREEMENT.IF YOU DO NOT AGREE TO SUCH TERMS AND CONDITIONS, YOU MUST NOT USE THE SOFTWARE. IF YOU ARE RECEIVING THE SOFTWARE ON BEHALF OF A LEGAL ENTITY, YOU REPRESENT AND WARRANT THAT YOU HAVE THE ACTUAL AUTHORITY TO AGREE TO THE TERMS AND CONDITIONS OF THIS AGREEMENT ON BEHALF OF SUCH ENTITY. “Licensee” means you, an individual, or the entity on whose behalf you are receiving the Software.
|
|
4
|
+
LICENSE GRANT AND CONDITIONS.
|
|
5
|
+
1.1 License. Subject to the terms and conditions of this Agreement, Spatialedge hereby grants to Licensee a non-exclusive, royalty-free, worldwide, non-transferable, non-sublicensable license during the term of this Agreement to: (a) use the Software; (b) prepare modifications and derivative works of the Software; (c) distribute the Software (including without limitation in source code or object code form); and (d) reproduce copies of the Software (the “License”). Licensee is not granted the right to, and Licensee shall not, exercise the License for an Excluded Purpose. For purposes of this Agreement, “Excluded Purpose” means making available any software-as-a-service, platform-as-a-service, infrastructure-as-a-service or other similar online service that competes with Spatialedge products or services that provide the Software.
|
|
6
|
+
1.2 Conditions. In consideration of the License, Licensee’s distribution of the Software is subject to the following conditions:
|
|
7
|
+
a. Licensee must cause any Software modified by Licensee to carry prominent notices stating that Licensee modified the Software.
|
|
8
|
+
b. On each Software copy, Licensee shall reproduce and not remove or alter all Spatialedge or third party copyright or other proprietary notices contained in the Software, and Licensee must provide (the Notice) as quoted in 1.2.1 below with each copy.
|
|
9
|
+
|
|
10
|
+
1.2.1 Notice.
|
|
11
|
+
"THIS SOFTWARE IS MADE AVAILABLE BY SPATIALEDGE (PTY) LTD, UNDER THE TERMS OF THE SPATIALEDGE COMMUNITY LICENSE AGREEMENT, VERSION 1.0. BY INSTALLING, DOWNLOADING, ACCESSING, USING OR DISTRIBUTING ANY OF THE SOFTWARE, YOU AGREE TO THE TERMS OF SUCH LICENSE AGREEMENT."
|
|
12
|
+
1.3 Licensee Modifications. Licensee may add its own copyright notices to modifications made by Licensee and may provide additional or different license terms and conditions for use, reproduction, or distribution of Licensee’s modifications. While redistributing the Software or modifications thereof, Licensee may choose to offer, for a fee or free of charge, support, warranty, indemnity, or other obligations. Licensee, and not Spatialedge, will be responsible for any such obligations.
|
|
13
|
+
1.4 No Sublicensing. The License does not include the right to sublicense the Software, however, each recipient to which Licensee provides the Software may exercise the Licenses so long as such recipient agrees to the terms and conditions of this Agreement.
|
|
14
|
+
TERM AND TERMINATION.
|
|
15
|
+
This Agreement will continue unless and until earlier terminated as set forth herein. If Licensee breaches any of its conditions or obligations under this Agreement, this Agreement will terminate automatically and the License will terminate automatically and permanently.
|
|
16
|
+
INTELLECTUAL PROPERTY.
|
|
17
|
+
As between the parties, Spatialedge will retain all right, title, and interest in the Software, and all intellectual property rights therein. Spatialedge hereby reserves all rights not expressly granted to Licensee in this Agreement. Spatialedge hereby reserves all rights in its trademarks and service marks, and no licenses therein are granted in this Agreement.
|
|
18
|
+
DISCLAIMER.
|
|
19
|
+
SPATIALEDGE HEREBY DISCLAIMS ANY AND ALL WARRANTIES AND CONDITIONS, EXPRESS, IMPLIED, STATUTORY, OR OTHERWISE, AND SPECIFICALLY DISCLAIMS ANY WARRANTY OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE, WITH RESPECT TO THE SOFTWARE.
|
|
20
|
+
LIMITATION OF LIABILITY.
|
|
21
|
+
SPATIALEDGE WILL NOT BE LIABLE FOR ANY DAMAGES OF ANY KIND, INCLUDING BUT NOT LIMITED TO, LOST PROFITS OR ANY CONSEQUENTIAL, SPECIAL, INCIDENTAL, INDIRECT, OR DIRECT DAMAGES, HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, ARISING OUT OF THIS AGREEMENT. THE FOREGOING SHALL APPLY TO THE EXTENT PERMITTED BY APPLICABLE LAW.
|
|
22
|
+
GENERAL.
|
|
23
|
+
6.1 Governing Law.This Agreement will be governed by and interpreted in accordance with the laws of the Republic of South Africa, without reference to its conflict of laws principles. If Licensee is located within the Republic of South Africa, all disputes arising out of this Agreement are subject to the exclusive jurisdiction of courts located in Cape Town, South Africa. If Licensee is located outside of South Africa, any dispute, controversy or claim arising out of or relating to this Agreement will be referred to and finally determined by arbitration in accordance with the JAMS International Arbitration Rules. The tribunal will consist of one arbitrator. The place of arbitration will be Cape Town, South Africa. The language to be used in the arbitral proceedings will be English. Judgment upon the award rendered by the arbitrator may be entered in any court having jurisdiction thereof.
|
|
24
|
+
6.2. Assignment. Licensee is not authorized to assign its rights under this Agreement to any third party. Spatialedge may freely assign its rights under this Agreement to any third party.
|
|
25
|
+
6.3. Other. This Agreement is the entire agreement between the parties regarding the subject matter hereof. No amendment or modification of this Agreement will be valid or binding upon the parties unless made in writing and signed by the duly authorized representatives of both parties. In the event that any provision, including without limitation any condition, of this Agreement is held to be unenforceable, this Agreement and all licenses and rights granted hereunder will immediately terminate. Waiver by Spatialedge of a breach of any provision of this Agreement or the failure by Spatialedge to exercise any right hereunder will not be construed as a waiver of any subsequent breach of that right or as a waiver of any other right.
|
awehflow-0.0.0/PKG-INFO
ADDED
|
@@ -0,0 +1,374 @@
|
|
|
1
|
+
Metadata-Version: 2.1
|
|
2
|
+
Name: awehflow
|
|
3
|
+
Version: 0.0.0
|
|
4
|
+
Summary: Configuration based Apache Airflow
|
|
5
|
+
Home-page:
|
|
6
|
+
Author: Philip Perold
|
|
7
|
+
Author-email: philip@spatialedge.co.za
|
|
8
|
+
Classifier: Programming Language :: Python :: 3
|
|
9
|
+
Classifier: Operating System :: OS Independent
|
|
10
|
+
Requires-Python: >=3.6
|
|
11
|
+
Description-Content-Type: text/markdown
|
|
12
|
+
License-File: LICENSE
|
|
13
|
+
Requires-Dist: dotty-dict>=1.2.1
|
|
14
|
+
Requires-Dist: pyhocon>=0.3.59
|
|
15
|
+
Requires-Dist: slackclient>=2.5.0
|
|
16
|
+
Provides-Extra: default
|
|
17
|
+
Requires-Dist: apache-airflow==2.1.4; extra == "default"
|
|
18
|
+
Provides-Extra: composer
|
|
19
|
+
Requires-Dist: apache-airflow==2.1.4+composer; extra == "composer"
|
|
20
|
+
|
|
21
|
+
# Awehflow
|
|
22
|
+
|
|
23
|
+

|
|
24
|
+

|
|
25
|
+
|
|
26
|
+
Configuration based Airflow pipelines with metric logging and alerting out the box.
|
|
27
|
+
|
|
28
|
+
## Prerequisites
|
|
29
|
+
|
|
30
|
+
### Development environment
|
|
31
|
+
In order to develop awehflow for a given version of Airflow follow these steps
|
|
32
|
+
1. Install and configure miniconda
|
|
33
|
+
1. On Mac, if running ARM create an x86 version of conda using the snippet below
|
|
34
|
+
```bash
|
|
35
|
+
### add this to ~/.zshrc (or ~/.bashrc if you're using Bash)
|
|
36
|
+
create_x86_conda_environment () {
|
|
37
|
+
# create a conda environment using x86 architecture
|
|
38
|
+
# first argument is environment name, all subsequent arguments will be passed to `conda create`
|
|
39
|
+
# example usage: create_x86_conda_environment myenv_x86 python=3.9
|
|
40
|
+
|
|
41
|
+
CONDA_SUBDIR=osx-64 conda create $@
|
|
42
|
+
conda activate $2
|
|
43
|
+
conda config --env --set subdir osx-64
|
|
44
|
+
}
|
|
45
|
+
```
|
|
46
|
+
1. Define the version that you'd like to install
|
|
47
|
+
```bash
|
|
48
|
+
export AIRFLOW_VERSION="2.1.4"
|
|
49
|
+
```
|
|
50
|
+
1. Create a conda environment for your version of Airflow, the bash below
|
|
51
|
+
```bash
|
|
52
|
+
create_x86_conda_environment -n "airflow_$AIRFLOW_VERSION" "python=3.8.12"
|
|
53
|
+
```
|
|
54
|
+
1. Configure the AIRFLOW_HOME directory
|
|
55
|
+
```bash
|
|
56
|
+
conda deactivate
|
|
57
|
+
conda activate "airflow_$AIRFLOW_VERSION"
|
|
58
|
+
conda env config vars set AIRFLOW_HOME="$HOME/airflow/airflow_$AIRFLOW_VERSION"
|
|
59
|
+
conda deactivate
|
|
60
|
+
conda activate airflow_"$AIRFLOW_VERSION"
|
|
61
|
+
echo "$AIRFLOW_HOME"
|
|
62
|
+
```
|
|
63
|
+
1. Install airflow using `pip`
|
|
64
|
+
```bash
|
|
65
|
+
conda activate airflow_$AIRFLOW_VERSION
|
|
66
|
+
pip install --no-cache-dir "apache-airflow==$AIRFLOW_VERSION"
|
|
67
|
+
```
|
|
68
|
+
1. Install required providers
|
|
69
|
+
```bash
|
|
70
|
+
conda activate airflow_$AIRFLOW_VERSION
|
|
71
|
+
pip install --no-cache-dir "apache-airflow[google]==$AIRFLOW_VERSION"
|
|
72
|
+
pip install --no-cache-dir "apache-airflow-providers-ssh==3.7.0"
|
|
73
|
+
pip install --no-cache-dir "apache-airflow[postgres]==$AIRFLOW_VERSION"
|
|
74
|
+
```
|
|
75
|
+
1. On MacOS ARM install the psycop binary
|
|
76
|
+
```bash
|
|
77
|
+
pip install --no-cache-dir "psycopg2-binary==`pip list | grep -i 'psycopg2 ' | tr -s ' ' | cut -d' ' -f 2`"
|
|
78
|
+
```
|
|
79
|
+
1. Customisation per version
|
|
80
|
+
1. For `2.2.3`
|
|
81
|
+
1. force the MarkupSafe package version
|
|
82
|
+
```bash
|
|
83
|
+
pip install --no-cache-dir markupsafe==2.0.1
|
|
84
|
+
```
|
|
85
|
+
1. For `2.5.3`
|
|
86
|
+
1. force pendulum package version
|
|
87
|
+
```bash
|
|
88
|
+
pip install --no-cache-dir "pendulum==2.0.0"
|
|
89
|
+
```
|
|
90
|
+
1. force Flask-Session package version
|
|
91
|
+
```bash
|
|
92
|
+
pip install --no-cache-dir "Flask-Session==0.5.0"
|
|
93
|
+
```
|
|
94
|
+
1. Install the awehflow requirements
|
|
95
|
+
```bash
|
|
96
|
+
pip install --no-cache-dir -r requirements.txt
|
|
97
|
+
```
|
|
98
|
+
1. Init the airflow db
|
|
99
|
+
```bash
|
|
100
|
+
airflow db init
|
|
101
|
+
```
|
|
102
|
+
|
|
103
|
+
You will need the following to run this code:
|
|
104
|
+
* Python 3
|
|
105
|
+
|
|
106
|
+
## Installation
|
|
107
|
+
|
|
108
|
+
```
|
|
109
|
+
pip install awehflow[default]
|
|
110
|
+
```
|
|
111
|
+
|
|
112
|
+
If you are installing on Google Cloud Composer with Airflow 1.10.2:
|
|
113
|
+
|
|
114
|
+
```
|
|
115
|
+
pip install awehflow[composer]
|
|
116
|
+
```
|
|
117
|
+
|
|
118
|
+
### Event & metric tables
|
|
119
|
+
Create a `postgresql` database that can be referenced via Airflow connection. In the DB create the following tables
|
|
120
|
+
- Jobs data table
|
|
121
|
+
```sql
|
|
122
|
+
CREATE TABLE public.jobs (
|
|
123
|
+
id serial4 NOT NULL,
|
|
124
|
+
run_id varchar NOT NULL,
|
|
125
|
+
dag_id varchar NULL,
|
|
126
|
+
"name" varchar NULL,
|
|
127
|
+
project varchar NULL,
|
|
128
|
+
status varchar NULL,
|
|
129
|
+
engineers json NULL,
|
|
130
|
+
error json NULL,
|
|
131
|
+
start_time timestamptz NULL,
|
|
132
|
+
end_time timestamptz NULL,
|
|
133
|
+
reference_time timestamptz NULL,
|
|
134
|
+
CONSTRAINT job_id_pkey PRIMARY KEY (id),
|
|
135
|
+
CONSTRAINT run_id_dag_id_unique UNIQUE (run_id, dag_id)
|
|
136
|
+
);
|
|
137
|
+
```
|
|
138
|
+
|
|
139
|
+
- Task metrics table
|
|
140
|
+
```sql
|
|
141
|
+
CREATE TABLE public.task_metrics (
|
|
142
|
+
id serial4 NOT NULL,
|
|
143
|
+
run_id varchar NULL,
|
|
144
|
+
dag_id varchar NULL,
|
|
145
|
+
task_id varchar NULL,
|
|
146
|
+
job_name varchar NULL,
|
|
147
|
+
value json NULL,
|
|
148
|
+
created_time timestamptz NULL,
|
|
149
|
+
reference_time timestamptz NULL,
|
|
150
|
+
CONSTRAINT task_metrics_id_pkey PRIMARY KEY (id)
|
|
151
|
+
);
|
|
152
|
+
```
|
|
153
|
+
|
|
154
|
+
- Data metrics table
|
|
155
|
+
```sql
|
|
156
|
+
CREATE TABLE public.data_metrics (
|
|
157
|
+
id serial4 NOT NULL,
|
|
158
|
+
platform varchar NULL,
|
|
159
|
+
"source" varchar NULL,
|
|
160
|
+
"key" varchar NULL,
|
|
161
|
+
value json NULL,
|
|
162
|
+
reference_time timestamptz NULL,
|
|
163
|
+
CONSTRAINT data_metrics_pkey PRIMARY KEY (id),
|
|
164
|
+
CONSTRAINT unique_metric UNIQUE (platform, source, key, reference_time)
|
|
165
|
+
);
|
|
166
|
+
```
|
|
167
|
+
|
|
168
|
+
## Usage
|
|
169
|
+
|
|
170
|
+
Usage of `awehflow` can be broken up into two parts: bootstrapping and configuration of pipelines
|
|
171
|
+
|
|
172
|
+
### Bootstrap
|
|
173
|
+
|
|
174
|
+
In order to expose the generated pipelines (`airflow` _DAGs_) for `airflow` to pick up when scanning for _DAGs_, one has to create a `DagLoader` that points to a folder where the pipeline configuration files will be located:
|
|
175
|
+
|
|
176
|
+
```python
|
|
177
|
+
import os
|
|
178
|
+
|
|
179
|
+
from awehflow.alerts.slack import SlackAlerter
|
|
180
|
+
from awehflow.core import DagLoader
|
|
181
|
+
from awehflow.events.postgres import PostgresMetricsEventHandler
|
|
182
|
+
|
|
183
|
+
"""airflow doesn't pick up DAGs in files unless
|
|
184
|
+
the words 'airflow' and 'DAG' features"""
|
|
185
|
+
|
|
186
|
+
configs_path = os.path.join(os.path.dirname(os.path.abspath(__file__)), 'configs')
|
|
187
|
+
|
|
188
|
+
metrics_handler = PostgresMetricsEventHandler(jobs_table='jobs', task_metrics_table='task_metrics')
|
|
189
|
+
|
|
190
|
+
slack_alerter = SlackAlerter(channel='#airflow')
|
|
191
|
+
|
|
192
|
+
loader = DagLoader(
|
|
193
|
+
project="awehflow-demo",
|
|
194
|
+
configs_path=configs_path,
|
|
195
|
+
event_handlers=[metrics_handler],
|
|
196
|
+
alerters=[slack_alerter]
|
|
197
|
+
)
|
|
198
|
+
|
|
199
|
+
dags = loader.load(global_symbol_table=globals())
|
|
200
|
+
```
|
|
201
|
+
|
|
202
|
+
As seen in the code snippet, one can also pass in _"event handlers"_ and _"alerters"_ to perform actions on certain pipeline events and potentially alert the user of certain events on a given channel. See the sections below for more detail.
|
|
203
|
+
The global symbol table needs to be passed to the `loader` since `airflow` scans it for objects of type `DAG`, and then synchronises the state with its own internal state store.
|
|
204
|
+
|
|
205
|
+
\*_caveat_: `airflow` ignores `python` files that don't contain the words _"airflow"_ and _"DAG"_. It is thus advised to put those words in a comment to ensure the generated _DAGs_ get picked up when the `DagBag` is getting filled.
|
|
206
|
+
|
|
207
|
+
#### Event Handlers
|
|
208
|
+
|
|
209
|
+
As a pipeline generated using `awehflow` is running, certain events get emitted. An event handler gives the user the option of running code when these events occur.
|
|
210
|
+
|
|
211
|
+
The following events are (potentially) potentially emitted as a pipeline runs:
|
|
212
|
+
|
|
213
|
+
* `start`
|
|
214
|
+
* `success`
|
|
215
|
+
* `failure`
|
|
216
|
+
* `task_metric`
|
|
217
|
+
|
|
218
|
+
Existing event handlers include:
|
|
219
|
+
|
|
220
|
+
* `PostgresMetricsEventHandler`: persists pipeline metrics to a Postgres database
|
|
221
|
+
* `PublishToGooglePubSubEventHandler`: events get passed straight to a Google Pub/Sub topic
|
|
222
|
+
|
|
223
|
+
An `AlertsEventHandler` gets automatically added to a pipeline. Events get passed along to registered alerters.
|
|
224
|
+
|
|
225
|
+
#### Alerters
|
|
226
|
+
|
|
227
|
+
An `Alerter` is merely a class that implements an `alert` method. By default a `SlackAlerter` is configured in the `dags/PROJECT/bootstrap.py` file of an awehflow project. awehflow supports the addition of multiple alerters, which allows success or failure events to be sent to mutliple channels
|
|
228
|
+
|
|
229
|
+
##### YAML configuration
|
|
230
|
+
In order to add alerts to an awehflow DAG add the following to the root space of the configuration
|
|
231
|
+
```YAML
|
|
232
|
+
alert_on:
|
|
233
|
+
- 'failure' # Send out a formatted message if a task in the DAG fails. This is optional
|
|
234
|
+
- 'success' # Send out a formatted message once the DAG completes successfully. This is optional
|
|
235
|
+
```
|
|
236
|
+
|
|
237
|
+
##### Available alerters
|
|
238
|
+
|
|
239
|
+
###### `SlackAlerter` - `awehflow.alerts.slack.SlackAlerter`
|
|
240
|
+
Sends an alert to a specified slack channel via the Slack webhook functionality
|
|
241
|
+
|
|
242
|
+
- Parameters
|
|
243
|
+
- `channel` - The name of the channel that the alerts should be sent to
|
|
244
|
+
- `slack_conn_id` - The name of the airflow connection that contains the token information, default: `slack_default`
|
|
245
|
+
- Connection requirements - Create a HTTP connection with the name specified for `slack_conn_id`, the required HTTP fields are:
|
|
246
|
+
- `password` - The slack token issued by your admin team, which allows for the sending of messages via the slack python API
|
|
247
|
+
|
|
248
|
+
|
|
249
|
+
##### `GoogleChatAlerter` - `awehflow.alerts.googlechat.GoogleChatAlerter`
|
|
250
|
+
Sends an alert to the configured Google Chat space
|
|
251
|
+
- Parameters
|
|
252
|
+
- `gchat_conn_id` - The name of the airflow connection that contains the GChat space information, default: `gchat_default`
|
|
253
|
+
- Connection requirements - Create a HTTP connection with the name specified for the `gchat_conn_id`, the requried HTTP fields are:
|
|
254
|
+
- `host` - The GChat spaces URL `https://chat.googleapis.com/v1/spaces`
|
|
255
|
+
- `password` - The GChat spaces key configuration information, ex `https://chat.googleapis.com/v1/spaces/SPACES_ID?key=SPACES_KEY`
|
|
256
|
+
- `SPACES_ID` - Should be supplied by your GChat admin team
|
|
257
|
+
- `SPACES_KEY` - Should be supplied by your GChat admin team
|
|
258
|
+
|
|
259
|
+
|
|
260
|
+
### Configuration
|
|
261
|
+
Awehflow configuration files can be written as .yml OR .hocon files either formats are supported
|
|
262
|
+
|
|
263
|
+
Shown below is sample hocon configuration file
|
|
264
|
+
```h
|
|
265
|
+
{
|
|
266
|
+
name: my_first_dag,
|
|
267
|
+
version: 1,
|
|
268
|
+
description: "This is my first dag",
|
|
269
|
+
owner: The name of the owner of the DAG
|
|
270
|
+
schedule: "10 0 * * *",
|
|
271
|
+
start_date: 2022-01-01,
|
|
272
|
+
end_date: 2022-01-01,
|
|
273
|
+
catchup: true,
|
|
274
|
+
concurrency: 1 // Defaults to airflow configuration
|
|
275
|
+
max_active_tasks: 1 // Defaults to airflow configuration
|
|
276
|
+
max_active_runs: 1 // Defaults to airflow configuration
|
|
277
|
+
dagrun_timeout: None
|
|
278
|
+
doc_md: The DAG documentation markdown
|
|
279
|
+
access_control: None // A dict of roles that have specific permissions
|
|
280
|
+
is_paused_upon_creation: None // Defaults to airflow configuration
|
|
281
|
+
tags: [
|
|
282
|
+
'tag one',
|
|
283
|
+
'tag two'
|
|
284
|
+
],
|
|
285
|
+
dag_params: {
|
|
286
|
+
/* This dict will define DAG parameters and defaulted when triggering a DAG manually with CONF,
|
|
287
|
+
Values are accessible as template values {{ dag_run.conf["config_value_1"] }}
|
|
288
|
+
*/
|
|
289
|
+
'config_value_1': 'SOME TEXT',
|
|
290
|
+
'config_value_2': 1234
|
|
291
|
+
},
|
|
292
|
+
alert_on:[ // Whether the events alert should send a message on success OR failure
|
|
293
|
+
success,
|
|
294
|
+
failure
|
|
295
|
+
],
|
|
296
|
+
params: { // parameter values that will be passed in to each task for rendering
|
|
297
|
+
default: {
|
|
298
|
+
source_folder: /tmp
|
|
299
|
+
},
|
|
300
|
+
production: {
|
|
301
|
+
source_folder: /data
|
|
302
|
+
}
|
|
303
|
+
},
|
|
304
|
+
default_dag_args: { //The default DAG arguments whichis also passed to each task in the dag
|
|
305
|
+
retries: 1
|
|
306
|
+
},
|
|
307
|
+
pre_hooks: [ // Pre hook sensors are executed BEFORE the start task
|
|
308
|
+
{
|
|
309
|
+
id: 'pre_hook_ping_sensor'
|
|
310
|
+
operator: 'airflow.sensors.bash.BashSensor'
|
|
311
|
+
params: {
|
|
312
|
+
bash_command: 'echo ping'
|
|
313
|
+
mode: 'reschedule'
|
|
314
|
+
}
|
|
315
|
+
}
|
|
316
|
+
],
|
|
317
|
+
dependencies: [ // Dependencies sensors are executed AFTER the start task to the DAG start time being logged
|
|
318
|
+
{
|
|
319
|
+
id: 'dependenciy_ping_sensor'
|
|
320
|
+
operator: 'airflow.sensors.bash.BashSensor'
|
|
321
|
+
params: {
|
|
322
|
+
bash_command: 'echo ping'
|
|
323
|
+
mode: 'reschedule'
|
|
324
|
+
}
|
|
325
|
+
}
|
|
326
|
+
],
|
|
327
|
+
tasks: [ // The array of Tasks that defines the DAG
|
|
328
|
+
{
|
|
329
|
+
id: first_dummy_task, // The task ID that will be shown in the task bubble or tree
|
|
330
|
+
operator: airflow.operators.dummy.DummyOperator, // The fully qualified path to the Class of the Operator
|
|
331
|
+
},
|
|
332
|
+
{
|
|
333
|
+
id: first_bash_task, // The task ID that will be shown in the task bubble or tree
|
|
334
|
+
operator: airflow.operators.bash.BashOperator, // The fully qualified path to the Class of the Operator
|
|
335
|
+
params: {
|
|
336
|
+
/*
|
|
337
|
+
The dictionary of parameters that will be passed to the Operator, the "default_dag_args" dict will be merged with this.
|
|
338
|
+
Any parameter of the Operator Class can be added to this dict, template rending of values depends on the specific Operator
|
|
339
|
+
*/
|
|
340
|
+
bash_command: 'echo "Hello World"'
|
|
341
|
+
},
|
|
342
|
+
upstream: [ // The list of tasks that must be executed prior to this task
|
|
343
|
+
first_dummy_task
|
|
344
|
+
]
|
|
345
|
+
}
|
|
346
|
+
]
|
|
347
|
+
}
|
|
348
|
+
```
|
|
349
|
+
|
|
350
|
+
This configuration does the following:
|
|
351
|
+
- Creates a DAG called `my_first_dag`
|
|
352
|
+
- Scheduled to run daily 10min past midnight
|
|
353
|
+
- Catchup has been enabled to ensure all runs of the DAG since 2022-01-01 are executed
|
|
354
|
+
- Pre hooks
|
|
355
|
+
- Check if the command `echo ping` succeeds
|
|
356
|
+
- Dependencies
|
|
357
|
+
- Check if the command `echo ping` succeeds
|
|
358
|
+
- Tasks
|
|
359
|
+
- First run a dummy task that does nothing
|
|
360
|
+
- If the dummy task succeeds, execute the bash command
|
|
361
|
+
|
|
362
|
+
## Running the tests
|
|
363
|
+
|
|
364
|
+
Tests may be run with
|
|
365
|
+
```bash
|
|
366
|
+
python -m unittest discover tests
|
|
367
|
+
```
|
|
368
|
+
|
|
369
|
+
or to run code coverage too:
|
|
370
|
+
|
|
371
|
+
```bash
|
|
372
|
+
coverage run -m unittest discover tests && coverage html
|
|
373
|
+
```
|
|
374
|
+
|