openmedallion 2026.4.1__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- openmedallion-2026.4.1/.gitignore +31 -0
- openmedallion-2026.4.1/CHANGELOG.md +45 -0
- openmedallion-2026.4.1/LICENSE +21 -0
- openmedallion-2026.4.1/PKG-INFO +410 -0
- openmedallion-2026.4.1/README.md +364 -0
- openmedallion-2026.4.1/examples/README.md +110 -0
- openmedallion-2026.4.1/examples/ecommerce_analytics_demo/README.md +254 -0
- openmedallion-2026.4.1/examples/incremental_sql_demo/README.md +171 -0
- openmedallion-2026.4.1/examples/local_parquet_demo/README.md +154 -0
- openmedallion-2026.4.1/openmedallion/__init__.py +17 -0
- openmedallion-2026.4.1/openmedallion/cli/__init__.py +0 -0
- openmedallion-2026.4.1/openmedallion/cli/main.py +235 -0
- openmedallion-2026.4.1/openmedallion/config/__init__.py +4 -0
- openmedallion-2026.4.1/openmedallion/config/loader.py +150 -0
- openmedallion-2026.4.1/openmedallion/config/validator.py +87 -0
- openmedallion-2026.4.1/openmedallion/contracts/__init__.py +3 -0
- openmedallion-2026.4.1/openmedallion/contracts/udf.py +100 -0
- openmedallion-2026.4.1/openmedallion/helpers/__init__.py +0 -0
- openmedallion-2026.4.1/openmedallion/helpers/aggregations.py +145 -0
- openmedallion-2026.4.1/openmedallion/helpers/dates.py +97 -0
- openmedallion-2026.4.1/openmedallion/helpers/joins.py +94 -0
- openmedallion-2026.4.1/openmedallion/helpers/windows.py +153 -0
- openmedallion-2026.4.1/openmedallion/pipeline/__init__.py +0 -0
- openmedallion-2026.4.1/openmedallion/pipeline/bronze.py +172 -0
- openmedallion-2026.4.1/openmedallion/pipeline/export.py +44 -0
- openmedallion-2026.4.1/openmedallion/pipeline/gold.py +85 -0
- openmedallion-2026.4.1/openmedallion/pipeline/nodes.py +64 -0
- openmedallion-2026.4.1/openmedallion/pipeline/silver.py +99 -0
- openmedallion-2026.4.1/openmedallion/scaffold/__init__.py +0 -0
- openmedallion-2026.4.1/openmedallion/scaffold/templates.py +500 -0
- openmedallion-2026.4.1/openmedallion/storage/__init__.py +17 -0
- openmedallion-2026.4.1/openmedallion/storage/fs.py +256 -0
- openmedallion-2026.4.1/openmedallion/viz/__init__.py +0 -0
- openmedallion-2026.4.1/openmedallion/viz/dag.py +149 -0
- openmedallion-2026.4.1/openmedallion/viz/server.py +218 -0
- openmedallion-2026.4.1/openmedallion/viz/tracker.py +77 -0
- openmedallion-2026.4.1/pyproject.toml +81 -0
|
@@ -0,0 +1,31 @@
|
|
|
1
|
+
|
|
2
|
+
# Python
|
|
3
|
+
__pycache__/
|
|
4
|
+
*.py[cod]
|
|
5
|
+
*.egg-info/
|
|
6
|
+
.venv/
|
|
7
|
+
dist/
|
|
8
|
+
build/
|
|
9
|
+
|
|
10
|
+
# Data (generated outputs)
|
|
11
|
+
data/
|
|
12
|
+
examples/*/data/
|
|
13
|
+
*.parquet
|
|
14
|
+
*.csv
|
|
15
|
+
!examples/*/data/source/*.csv
|
|
16
|
+
|
|
17
|
+
# dlt state
|
|
18
|
+
.dlt/
|
|
19
|
+
|
|
20
|
+
# IDE
|
|
21
|
+
.vscode/
|
|
22
|
+
.idea/
|
|
23
|
+
|
|
24
|
+
# OS
|
|
25
|
+
.DS_Store
|
|
26
|
+
|
|
27
|
+
|
|
28
|
+
# Additional
|
|
29
|
+
CLAUDE.md
|
|
30
|
+
PLAN.md
|
|
31
|
+
graphify-out/*
|
|
@@ -0,0 +1,45 @@
|
|
|
1
|
+
# Changelog
|
|
2
|
+
|
|
3
|
+
All notable changes to openmedallion are documented here.
|
|
4
|
+
Format follows [Keep a Changelog](https://keepachangelog.com/en/1.1.0/).
|
|
5
|
+
Versions follow [Semantic Versioning](https://semver.org/).
|
|
6
|
+
|
|
7
|
+
---
|
|
8
|
+
|
|
9
|
+
## [Unreleased]
|
|
10
|
+
|
|
11
|
+
### Added
|
|
12
|
+
|
|
13
|
+
- GitHub Actions CI workflow (multi-Python matrix, lint)
|
|
14
|
+
- GitHub Actions publish workflow (TestPyPI → PyPI via OIDC trusted publishing)
|
|
15
|
+
- Expanded `medallion init` scaffold: `backend/`, `frontend/`, `data/` (gitignored),
|
|
16
|
+
`summary/`, and full `README.md` template
|
|
17
|
+
|
|
18
|
+
---
|
|
19
|
+
|
|
20
|
+
## [2026.4.1] — 2026-04-22
|
|
21
|
+
|
|
22
|
+
### Added
|
|
23
|
+
|
|
24
|
+
- `openmedallion.config` — `load_project`, `expand_env_str`, `_deep_merge`, `_validate_config`
|
|
25
|
+
- `openmedallion.contracts.udf` — `load_udf`, `check_return`
|
|
26
|
+
- `openmedallion.pipeline` — `BronzeLoader`, `SilverTransformer`, `GoldAggregator`, `BIExporter`
|
|
27
|
+
- `openmedallion.pipeline.nodes` — Hamilton DAG node functions
|
|
28
|
+
- `openmedallion.storage` — `read_parquet`, `write_parquet`, `write_csv`, `join`, `exists`,
|
|
29
|
+
`mkdir`, `ls_parquets`, `copy`, `is_s3`, `storage_opts`
|
|
30
|
+
- `openmedallion.helpers.joins` — `join_tables`, `lookup_join`, `safe_join`, `multi_join`,
|
|
31
|
+
`asof_join`, `cross_join_filtered`
|
|
32
|
+
- `openmedallion.helpers.windows` — `rank_within`, `row_number`, `running_total`, `lag_column`,
|
|
33
|
+
`lead_column`, `pct_of_total`, `rolling_avg`, `first_last_within`
|
|
34
|
+
- `openmedallion.helpers.aggregations` — `attach_group_stats`, `top_n_within`,
|
|
35
|
+
`pivot_to_columns`, `unpivot_columns`, `flag_outliers`
|
|
36
|
+
- `openmedallion.helpers.dates` — `date_trunc`, `days_between`, `classify_recency`,
|
|
37
|
+
`add_calendar_columns`
|
|
38
|
+
- `openmedallion.scaffold.templates` — `init_project`
|
|
39
|
+
- `openmedallion.viz` — DAG visualiser, live-reload server, run tracker
|
|
40
|
+
- `medallion` CLI — `run`, `init`, `dag`, `serve`, `ui` subcommands
|
|
41
|
+
- S3 support via `openmedallion[s3]` optional extra (s3fs + boto3)
|
|
42
|
+
- LocalStack compatibility via `AWS_ENDPOINT_URL` environment variable
|
|
43
|
+
|
|
44
|
+
[Unreleased]: https://github.com/tummala-hareesh/openmedallion/compare/v2026.4.1...HEAD
|
|
45
|
+
[2026.4.1]: https://github.com/tummala-hareesh/openmedallion/releases/tag/v2026.4.1
|
|
@@ -0,0 +1,21 @@
|
|
|
1
|
+
MIT License
|
|
2
|
+
|
|
3
|
+
Copyright (c) 2026 openmedallion contributors
|
|
4
|
+
|
|
5
|
+
Permission is hereby granted, free of charge, to any person obtaining a copy
|
|
6
|
+
of this software and associated documentation files (the "Software"), to deal
|
|
7
|
+
in the Software without restriction, including without limitation the rights
|
|
8
|
+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
|
9
|
+
copies of the Software, and to permit persons to whom the Software is
|
|
10
|
+
furnished to do so, subject to the following conditions:
|
|
11
|
+
|
|
12
|
+
The above copyright notice and this permission notice shall be included in all
|
|
13
|
+
copies or substantial portions of the Software.
|
|
14
|
+
|
|
15
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
|
16
|
+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
|
17
|
+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
|
18
|
+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
|
19
|
+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
|
20
|
+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
|
21
|
+
SOFTWARE.
|
|
@@ -0,0 +1,410 @@
|
|
|
1
|
+
Metadata-Version: 2.4
|
|
2
|
+
Name: openmedallion
|
|
3
|
+
Version: 2026.4.1
|
|
4
|
+
Summary: Declarative medallion pipelines in pure open-source Python — local first, cloud portable, fast by default.
|
|
5
|
+
Project-URL: Homepage, https://github.com/tummala-hareesh/openmedallion
|
|
6
|
+
Project-URL: Repository, https://github.com/tummala-hareesh/openmedallion
|
|
7
|
+
Project-URL: Documentation, https://tummala-hareesh.github.io/openmedallion/
|
|
8
|
+
Project-URL: Bug Tracker, https://github.com/tummala-hareesh/openmedallion/issues
|
|
9
|
+
Project-URL: Changelog, https://github.com/tummala-hareesh/openmedallion/blob/main/CHANGELOG.md
|
|
10
|
+
License: MIT
|
|
11
|
+
License-File: LICENSE
|
|
12
|
+
Keywords: bronze silver gold,data engineering,data lakehouse,dlt,elt,etl,hamilton,medallion,pipeline,polars
|
|
13
|
+
Classifier: Development Status :: 3 - Alpha
|
|
14
|
+
Classifier: Intended Audience :: Developers
|
|
15
|
+
Classifier: Intended Audience :: Science/Research
|
|
16
|
+
Classifier: License :: OSI Approved :: MIT License
|
|
17
|
+
Classifier: Operating System :: OS Independent
|
|
18
|
+
Classifier: Programming Language :: Python :: 3
|
|
19
|
+
Classifier: Programming Language :: Python :: 3.11
|
|
20
|
+
Classifier: Programming Language :: Python :: 3.12
|
|
21
|
+
Classifier: Programming Language :: Python :: 3.13
|
|
22
|
+
Classifier: Topic :: Database
|
|
23
|
+
Classifier: Topic :: Scientific/Engineering :: Information Analysis
|
|
24
|
+
Classifier: Topic :: Software Development :: Libraries :: Python Modules
|
|
25
|
+
Requires-Python: >=3.11
|
|
26
|
+
Requires-Dist: dlt[filesystem,sql-database]>=1.4
|
|
27
|
+
Requires-Dist: fastapi>=0.115
|
|
28
|
+
Requires-Dist: polars>=1.0
|
|
29
|
+
Requires-Dist: pyyaml>=6.0
|
|
30
|
+
Requires-Dist: sf-hamilton>=1.82
|
|
31
|
+
Requires-Dist: uvicorn>=0.30
|
|
32
|
+
Requires-Dist: websockets>=12.0
|
|
33
|
+
Provides-Extra: dev
|
|
34
|
+
Requires-Dist: pytest-anyio>=0.0; extra == 'dev'
|
|
35
|
+
Requires-Dist: pytest>=8; extra == 'dev'
|
|
36
|
+
Provides-Extra: docs
|
|
37
|
+
Requires-Dist: mkdocs-material>=9.5; extra == 'docs'
|
|
38
|
+
Requires-Dist: mkdocs>=1.6; extra == 'docs'
|
|
39
|
+
Requires-Dist: pymdown-extensions>=10.0; extra == 'docs'
|
|
40
|
+
Provides-Extra: s3
|
|
41
|
+
Requires-Dist: boto3>=1.34; extra == 's3'
|
|
42
|
+
Requires-Dist: s3fs>=2024.1; extra == 's3'
|
|
43
|
+
Provides-Extra: viz
|
|
44
|
+
Requires-Dist: graphviz>=0.20; extra == 'viz'
|
|
45
|
+
Description-Content-Type: text/markdown
|
|
46
|
+
|
|
47
|
+
# OpenMedallion
|
|
48
|
+
|
|
49
|
+
**Declarative medallion pipelines in pure open-source Python — local first, cloud portable, fast by default.**
|
|
50
|
+
|
|
51
|
+
[](https://pypi.org/project/openmedallion/)
|
|
52
|
+
[](https://www.python.org/)
|
|
53
|
+
[](LICENSE)
|
|
54
|
+
[](https://github.com/tummalaahri/openmedallion/actions)
|
|
55
|
+
|
|
56
|
+
OpenMedallion is an opinionated open-source library for building **Bronze → Silver → Gold** data warehouse and lakehouse pipelines using **dlt**, **Polars**, and **Hamilton** — without depending on expensive enterprise platforms or proprietary tooling.
|
|
57
|
+
|
|
58
|
+
---
|
|
59
|
+
|
|
60
|
+
## Why OpenMedallion?
|
|
61
|
+
|
|
62
|
+
Modern open-source data tools are individually excellent — but combining them into a production-ready medallion architecture is still fragmented.
|
|
63
|
+
|
|
64
|
+
You already have great tools for ingestion, transformation, loading, orchestration, and validation. But you still have to stitch everything together yourself — writing glue code, defining project structure, creating naming conventions, managing layer boundaries, and maintaining all of it over time.
|
|
65
|
+
|
|
66
|
+
**OpenMedallion exists to reduce that friction.**
|
|
67
|
+
|
|
68
|
+
| Without OpenMedallion | With OpenMedallion |
|
|
69
|
+
| --- | --- |
|
|
70
|
+
| Glue code per project | Convention-driven project layout |
|
|
71
|
+
| Ad-hoc layer boundaries | Enforced Bronze / Silver / Gold contracts |
|
|
72
|
+
| Inline transforms | Composable Python UDFs |
|
|
73
|
+
| Manual orchestration | Hamilton DAG — wired automatically |
|
|
74
|
+
| Cloud-only dev loop | Local Parquet first, S3 with one config change |
|
|
75
|
+
|
|
76
|
+
---
|
|
77
|
+
|
|
78
|
+
## Quickstart
|
|
79
|
+
|
|
80
|
+
```bash
|
|
81
|
+
pip install openmedallion
|
|
82
|
+
|
|
83
|
+
medallion init my_project # scaffold: YAML configs + UDF stubs + kestra_flow.yml
|
|
84
|
+
medallion run my_project # Bronze → Silver → Gold in one command
|
|
85
|
+
medallion run my_project --layer silver # re-run a single layer
|
|
86
|
+
medallion dag # print the Hamilton DAG
|
|
87
|
+
medallion serve # launch the live pipeline tracker UI
|
|
88
|
+
```
|
|
89
|
+
|
|
90
|
+
---
|
|
91
|
+
|
|
92
|
+
## Key Features
|
|
93
|
+
|
|
94
|
+
- **Declarative YAML config** — define pipeline layers without writing boilerplate
|
|
95
|
+
- **Incremental loads** — append and merge modes via dlt cursor columns and primary keys
|
|
96
|
+
- **Composable UDFs** — drop Python functions into `udf/silver/` or `udf/gold/`; no new framework to learn
|
|
97
|
+
- **Live DAG tracker** — Hamilton-powered web UI to visualise and monitor execution
|
|
98
|
+
- **Local first** — run the full pipeline against Parquet files with zero cloud credentials
|
|
99
|
+
- **Cloud portable** — swap `filesystem` for S3 in one line; logic stays unchanged
|
|
100
|
+
- **Source agnostic** — any dlt source: SQL databases, REST APIs, filesystems, and more
|
|
101
|
+
- **Fast by default** — Polars for all transforms; no pandas bottlenecks
|
|
102
|
+
|
|
103
|
+
---
|
|
104
|
+
|
|
105
|
+
## How It Works
|
|
106
|
+
|
|
107
|
+
OpenMedallion wires three best-in-class open-source tools under a unified declarative config:
|
|
108
|
+
|
|
109
|
+
```text
|
|
110
|
+
YAML config
|
|
111
|
+
│
|
|
112
|
+
▼
|
|
113
|
+
Hamilton DAG ← orchestrates which layer runs and in what order
|
|
114
|
+
│
|
|
115
|
+
├── Bronze (dlt) ← ingests raw data from any source into Parquet
|
|
116
|
+
├── Silver (Polars) ← typed UDF transforms: rename, cast, filter, enrich
|
|
117
|
+
└── Gold (Polars) ← YAML-declared group-by aggregations + window metrics
|
|
118
|
+
```
|
|
119
|
+
|
|
120
|
+
| Layer | Tool | Role |
|
|
121
|
+
| --- | --- | --- |
|
|
122
|
+
| 🟤 Bronze | dlt | Schema-inferred raw load from any source |
|
|
123
|
+
| ⚪ Silver | Polars | Typed, composable Python UDFs |
|
|
124
|
+
| 🟡 Gold | Polars | YAML-declared group-by metrics |
|
|
125
|
+
| 📤 Export | Polars | Parquet + CSV for BI tools |
|
|
126
|
+
| 🔗 Orchestration | Hamilton | DAG wiring with live web tracker |
|
|
127
|
+
|
|
128
|
+
---
|
|
129
|
+
|
|
130
|
+
## Installation
|
|
131
|
+
|
|
132
|
+
```bash
|
|
133
|
+
pip install openmedallion
|
|
134
|
+
```
|
|
135
|
+
|
|
136
|
+
Optional extras:
|
|
137
|
+
|
|
138
|
+
```bash
|
|
139
|
+
pip install "openmedallion[s3]" # S3 support via s3fs + boto3
|
|
140
|
+
pip install "openmedallion[viz]" # DAG visualisation via graphviz
|
|
141
|
+
```
|
|
142
|
+
|
|
143
|
+
> Requires Python 3.11+
|
|
144
|
+
|
|
145
|
+
---
|
|
146
|
+
|
|
147
|
+
## Project Structure
|
|
148
|
+
|
|
149
|
+
`medallion init my_project` generates a complete, ready-to-run project:
|
|
150
|
+
|
|
151
|
+
```text
|
|
152
|
+
my_project/
|
|
153
|
+
├── main.yaml # pipeline name + layer includes + paths
|
|
154
|
+
├── backend/
|
|
155
|
+
│ ├── bronze.yaml # source connection + incremental config
|
|
156
|
+
│ ├── silver.yaml # table transforms (rename, cast, filter, UDFs)
|
|
157
|
+
│ ├── gold.yaml # aggregations (group_by + metrics + window fns)
|
|
158
|
+
│ └── udf/
|
|
159
|
+
│ ├── silver/ # Python UDFs called from silver.yaml
|
|
160
|
+
│ └── gold/ # Python UDFs called from gold.yaml
|
|
161
|
+
├── frontend/ # dashboard files (Tableau, Power BI, etc.)
|
|
162
|
+
├── data/ # gitignored pipeline outputs
|
|
163
|
+
├── summary/ # analysis write-ups
|
|
164
|
+
├── kestra_flow.yml # Kestra orchestration flow — mount via docker-compose.yml
|
|
165
|
+
└── README.md # pre-filled project documentation template
|
|
166
|
+
```
|
|
167
|
+
|
|
168
|
+
---
|
|
169
|
+
|
|
170
|
+
## Configuration
|
|
171
|
+
|
|
172
|
+
**`main.yaml`** — declare your layers and data paths:
|
|
173
|
+
|
|
174
|
+
```yaml
|
|
175
|
+
pipeline:
|
|
176
|
+
name: customer_warehouse
|
|
177
|
+
|
|
178
|
+
includes:
|
|
179
|
+
bronze: bronze.yaml
|
|
180
|
+
silver: silver.yaml
|
|
181
|
+
gold: gold.yaml
|
|
182
|
+
|
|
183
|
+
paths:
|
|
184
|
+
bronze: "./data/bronze"
|
|
185
|
+
silver: "./data/silver"
|
|
186
|
+
gold: "./data/gold"
|
|
187
|
+
export: "./data/export"
|
|
188
|
+
```
|
|
189
|
+
|
|
190
|
+
**`silver.yaml`** — declarative transforms with optional UDFs:
|
|
191
|
+
|
|
192
|
+
```yaml
|
|
193
|
+
bronze_to_silver:
|
|
194
|
+
tables:
|
|
195
|
+
- source_file: ORDERS.parquet
|
|
196
|
+
output_file: orders.parquet
|
|
197
|
+
transforms:
|
|
198
|
+
- type: rename
|
|
199
|
+
columns:
|
|
200
|
+
ORDER_ID: order_id
|
|
201
|
+
CUSTOMER_ID: customer_id
|
|
202
|
+
- type: cast
|
|
203
|
+
columns:
|
|
204
|
+
order_id: Int64
|
|
205
|
+
amount: Float64
|
|
206
|
+
- type: udf
|
|
207
|
+
file: udf/silver/enrich.py
|
|
208
|
+
function: flag_large_orders
|
|
209
|
+
args:
|
|
210
|
+
threshold: 500.0
|
|
211
|
+
```
|
|
212
|
+
|
|
213
|
+
**`gold.yaml`** — YAML-declared aggregations:
|
|
214
|
+
|
|
215
|
+
```yaml
|
|
216
|
+
silver_to_gold:
|
|
217
|
+
projects:
|
|
218
|
+
- name: customer_warehouse
|
|
219
|
+
aggregations:
|
|
220
|
+
- source_file: orders.parquet
|
|
221
|
+
group_by: [customer_id]
|
|
222
|
+
metrics:
|
|
223
|
+
- {column: order_id, agg: count, alias: total_orders}
|
|
224
|
+
- {column: amount, agg: sum, alias: total_spent}
|
|
225
|
+
output_file: customer_summary.parquet
|
|
226
|
+
```
|
|
227
|
+
|
|
228
|
+
---
|
|
229
|
+
|
|
230
|
+
## Python UDFs
|
|
231
|
+
|
|
232
|
+
Business logic stays in plain Python — no custom DSL, no magic.
|
|
233
|
+
|
|
234
|
+
```python
|
|
235
|
+
# udf/silver/enrich.py
|
|
236
|
+
import polars as pl
|
|
237
|
+
|
|
238
|
+
def flag_large_orders(df: pl.DataFrame, threshold: float = 500.0) -> pl.DataFrame:
|
|
239
|
+
return df.with_columns(
|
|
240
|
+
(pl.col("amount") >= threshold).alias("is_large_order")
|
|
241
|
+
)
|
|
242
|
+
```
|
|
243
|
+
|
|
244
|
+
Drop the file next to your config, reference it in `silver.yaml`, done.
|
|
245
|
+
|
|
246
|
+
---
|
|
247
|
+
|
|
248
|
+
## Incremental Loads
|
|
249
|
+
|
|
250
|
+
OpenMedallion supports dlt's native incremental strategies out of the box:
|
|
251
|
+
|
|
252
|
+
```yaml
|
|
253
|
+
# bronze.yaml
|
|
254
|
+
source:
|
|
255
|
+
type: sql_database
|
|
256
|
+
dialect: sqlite
|
|
257
|
+
connection_string: "sqlite:///data/mydb.db"
|
|
258
|
+
tables:
|
|
259
|
+
- name: orders
|
|
260
|
+
incremental:
|
|
261
|
+
mode: append # cursor-based — only new rows
|
|
262
|
+
cursor_column: created_at
|
|
263
|
+
initial_value: "2024-01-01"
|
|
264
|
+
- name: customers
|
|
265
|
+
incremental:
|
|
266
|
+
mode: merge # upsert — handles updates + deletes
|
|
267
|
+
primary_key: customer_id
|
|
268
|
+
```
|
|
269
|
+
|
|
270
|
+
dlt tracks cursor state automatically. Re-running bronze only pulls the delta.
|
|
271
|
+
|
|
272
|
+
---
|
|
273
|
+
|
|
274
|
+
## Scheduling with Kestra
|
|
275
|
+
|
|
276
|
+
`medallion init` generates a `kestra_flow.yml` inside every new project — a ready-to-use [Kestra](https://kestra.io) flow that orchestrates bronze → silver → gold with per-task observability and retry support.
|
|
277
|
+
|
|
278
|
+
### 1. Start a local Kestra server
|
|
279
|
+
|
|
280
|
+
```bash
|
|
281
|
+
# from the repo root — requires Docker
|
|
282
|
+
make kestra-up
|
|
283
|
+
# UI available at http://localhost:8080
|
|
284
|
+
```
|
|
285
|
+
|
|
286
|
+
### 2. Register a project flow
|
|
287
|
+
|
|
288
|
+
Add one volume mount to the `kestra` service in `docker-compose.yml`:
|
|
289
|
+
|
|
290
|
+
```yaml
|
|
291
|
+
- ./my_project/kestra_flow.yml:/app/flows/my_project.yml
|
|
292
|
+
```
|
|
293
|
+
|
|
294
|
+
Kestra picks up the file automatically on the next `make kestra-up` — no copying needed.
|
|
295
|
+
|
|
296
|
+
### 3. Trigger a run
|
|
297
|
+
|
|
298
|
+
From the UI at `http://localhost:8080`, or via the API:
|
|
299
|
+
|
|
300
|
+
```bash
|
|
301
|
+
curl -X POST \
|
|
302
|
+
http://localhost:8080/api/v1/executions/openmedallion.projects/my_project
|
|
303
|
+
```
|
|
304
|
+
|
|
305
|
+
### 4. Enable scheduled refresh
|
|
306
|
+
|
|
307
|
+
Uncomment the `triggers:` block in `kestra_flow.yml`:
|
|
308
|
+
|
|
309
|
+
```yaml
|
|
310
|
+
triggers:
|
|
311
|
+
- id: daily_refresh
|
|
312
|
+
type: io.kestra.plugin.core.trigger.Schedule
|
|
313
|
+
cron: "0 6 * * *" # every day at 06:00 UTC
|
|
314
|
+
```
|
|
315
|
+
|
|
316
|
+
Restart with `make kestra-up` and Kestra picks up the change immediately.
|
|
317
|
+
|
|
318
|
+
### Kestra vs GitHub Actions
|
|
319
|
+
|
|
320
|
+
| | Kestra | GitHub Actions |
|
|
321
|
+
| --- | --- | --- |
|
|
322
|
+
| Best for | Recurring pipeline runs, local/on-prem data | CI tests + PyPI publish on tag push |
|
|
323
|
+
| Scheduling | Cron + backfill | Cron only, no backfill |
|
|
324
|
+
| Observability | Per-task logs, run history, retry from failed task | Flat job log |
|
|
325
|
+
| Infrastructure | Self-hosted Docker | GitHub-managed runners |
|
|
326
|
+
|
|
327
|
+
**Recommended split:** GitHub Actions for CI + publish; Kestra for pipeline scheduling.
|
|
328
|
+
|
|
329
|
+
---
|
|
330
|
+
|
|
331
|
+
## Examples
|
|
332
|
+
|
|
333
|
+
Three self-contained examples — no cloud credentials required. See [`examples/README.md`](examples/README.md) for a side-by-side comparison.
|
|
334
|
+
|
|
335
|
+
| Example | Tables | What it demonstrates |
|
|
336
|
+
| --- | --- | --- |
|
|
337
|
+
| [`local_parquet_demo/`](examples/local_parquet_demo/) | 1 | Zero-credential quickstart: full Bronze → Silver → Gold with local Parquet files |
|
|
338
|
+
| [`incremental_sql_demo/`](examples/incremental_sql_demo/) | 2 | Incremental append + merge from SQLite; delta load simulation |
|
|
339
|
+
| [`ecommerce_analytics_demo/`](examples/ecommerce_analytics_demo/) | 3 | Multi-table joins, margin analysis, and monthly trends — most complete example |
|
|
340
|
+
|
|
341
|
+
---
|
|
342
|
+
|
|
343
|
+
## When to Use OpenMedallion
|
|
344
|
+
|
|
345
|
+
A great fit if you:
|
|
346
|
+
|
|
347
|
+
- Want a **standard medallion project layout** without inventing one from scratch
|
|
348
|
+
- Prefer **YAML-first config** with Python escape hatches for complex logic
|
|
349
|
+
- Need **local-first development** that can scale to S3 with minimal changes
|
|
350
|
+
- Want **full ownership** of your code and infrastructure
|
|
351
|
+
- Are building on a **tight budget** without enterprise platform procurement
|
|
352
|
+
|
|
353
|
+
**Not a fit if you need:**
|
|
354
|
+
|
|
355
|
+
- A full enterprise data platform (Databricks, Snowflake, BigQuery)
|
|
356
|
+
- A no-code or drag-and-drop ETL tool
|
|
357
|
+
- A universal framework for every possible pipeline architecture
|
|
358
|
+
|
|
359
|
+
---
|
|
360
|
+
|
|
361
|
+
## Tradeoffs
|
|
362
|
+
|
|
363
|
+
| You get | You accept |
|
|
364
|
+
| --- | --- |
|
|
365
|
+
| Lower cost — fully open-source | More engineering responsibility than a managed platform |
|
|
366
|
+
| Full control over code and infrastructure | Initial setup and config learning curve |
|
|
367
|
+
| No vendor lock-in | You own the infrastructure decisions |
|
|
368
|
+
| Transparent, inspectable pipeline | Not a drag-and-drop tool |
|
|
369
|
+
|
|
370
|
+
---
|
|
371
|
+
|
|
372
|
+
## Roadmap
|
|
373
|
+
|
|
374
|
+
| Item | Status |
|
|
375
|
+
| --- | --- |
|
|
376
|
+
| Bronze / Silver / Gold pipeline | ✅ 2026.4.1 |
|
|
377
|
+
| Hamilton DAG + live tracker | ✅ 2026.4.1 |
|
|
378
|
+
| Local Parquet + S3 storage | ✅ 2026.4.1 |
|
|
379
|
+
| Incremental append + merge | ✅ 2026.4.1 |
|
|
380
|
+
| CLI scaffolding (`medallion init`) | ✅ 2026.4.1 |
|
|
381
|
+
| PyPI publish (OIDC trusted publishing) | ✅ 2026.4.1 |
|
|
382
|
+
| LazyFrame UDF contract | 🔜 2026.5 |
|
|
383
|
+
| Schema contract enforcement | 🔜 2026.6 |
|
|
384
|
+
| Lineage + metadata helpers | 🔜 2026.6 |
|
|
385
|
+
| Additional cloud destinations | 🔜 2026.6 |
|
|
386
|
+
|
|
387
|
+
---
|
|
388
|
+
|
|
389
|
+
## Contributing
|
|
390
|
+
|
|
391
|
+
Contributions are welcome. Good areas to contribute:
|
|
392
|
+
|
|
393
|
+
- Bug fixes and edge-case handling
|
|
394
|
+
- Documentation improvements and example additions
|
|
395
|
+
- Tests and coverage
|
|
396
|
+
- New pipeline templates
|
|
397
|
+
- New source or destination adapters
|
|
398
|
+
- CLI enhancements
|
|
399
|
+
|
|
400
|
+
If you are interested in open-source data architecture, your help is appreciated.
|
|
401
|
+
|
|
402
|
+
---
|
|
403
|
+
|
|
404
|
+
## License
|
|
405
|
+
|
|
406
|
+
[MIT](LICENSE) — free to use, modify, and distribute.
|
|
407
|
+
|
|
408
|
+
---
|
|
409
|
+
|
|
410
|
+
> If OpenMedallion looks useful, consider starring the repo — it helps others find it.
|