engel-semantic-layer 0.2.1__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2026 Rasmus Engelbrecht
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
@@ -0,0 +1,512 @@
1
+ Metadata-Version: 2.4
2
+ Name: engel-semantic-layer
3
+ Version: 0.2.1
4
+ Summary: Engel-style semantic layer compiler for BigQuery
5
+ Author: Rasmus Engelbrecht
6
+ License-Expression: MIT
7
+ Project-URL: Homepage, https://github.com/rasmusengelbrecht/engel-semantic-layer-public
8
+ Project-URL: Repository, https://github.com/rasmusengelbrecht/engel-semantic-layer-public
9
+ Project-URL: Issues, https://github.com/rasmusengelbrecht/engel-semantic-layer-public/issues
10
+ Keywords: semantic-layer,metrics,sql,bigquery,yaml
11
+ Classifier: Development Status :: 3 - Alpha
12
+ Classifier: Intended Audience :: Developers
13
+ Classifier: Programming Language :: Python :: 3
14
+ Classifier: Programming Language :: Python :: 3 :: Only
15
+ Classifier: Programming Language :: Python :: 3.10
16
+ Classifier: Programming Language :: Python :: 3.11
17
+ Classifier: Programming Language :: Python :: 3.12
18
+ Classifier: Topic :: Database
19
+ Classifier: Topic :: Software Development :: Libraries
20
+ Requires-Python: >=3.10
21
+ Description-Content-Type: text/markdown
22
+ License-File: LICENSE
23
+ Requires-Dist: PyYAML>=6.0
24
+ Provides-Extra: dev
25
+ Requires-Dist: pytest>=8.0; extra == "dev"
26
+ Dynamic: license-file
27
+
28
+ # engel-semantic-layer
29
+
30
+ Engel-style semantic layer as a Python package.
31
+
32
+ This package lets users define metrics/modules in YAML (same shape as Engel's code reference) and compile executable BigQuery SQL for a metric query request.
33
+
34
+ ## Scope
35
+
36
+ - YAML loading from:
37
+ - one-module-per-file (`module:` root), or
38
+ - single-file semantic layer (`semantic_layer.modules` + `semantic_layer.cross_module_metrics`)
39
+ - Metric types:
40
+ - `sum`
41
+ - `count`
42
+ - `ratio`
43
+ - `count-distinct`
44
+ - `custom-value`
45
+ - `custom-ratio`
46
+ - `derived-ratio` (cross-module, metric-on-metric, `join_on: time`)
47
+ - Metric filters + query-time filters
48
+ - Metric slices (`slice_name`) for base metrics
49
+ - Time grain support: none, daily, weekly, monthly, quarterly, yearly
50
+ - Breakdowns with dimension access checks
51
+ - Join path traversal across modules
52
+ - BigQuery SQL output
53
+
54
+ ## Changelog
55
+
56
+ See [CHANGELOG.md](CHANGELOG.md) for release notes.
57
+
58
+ ## Installation
59
+
60
+ Install from a private GitHub release wheel asset:
61
+
62
+ ```bash
63
+ pip install \
64
+ "https://github.com/<owner>/engel-semantic-layer/releases/download/v0.2.0/engel_semantic_layer-0.2.0-py3-none-any.whl"
65
+ ```
66
+
67
+ For private repos, use authenticated download (tokened URL or API download flow).
68
+
69
+ Install from source for local development:
70
+
71
+ ```bash
72
+ pip install -e .
73
+ ```
74
+
75
+ ## Example
76
+
77
+ ```python
78
+ from engel_semantic_layer import QueryFilter, QueryRequest, SemanticLayer
79
+
80
+ layer = SemanticLayer.from_path("examples/model")
81
+
82
+ request = QueryRequest(
83
+ from_date="2026-01-01T00:00:00Z",
84
+ to_date="2026-01-31T23:59:59Z",
85
+ time_grain="daily",
86
+ breakdown_dimension_ids=["user_email"],
87
+ filters=[
88
+ QueryFilter(
89
+ dimension_id="user_email",
90
+ filter_values=["demo-user@example.com"],
91
+ )
92
+ ],
93
+ )
94
+
95
+ sql = layer.compile_sql(metric_id="metric_event_count", request=request)
96
+ print(sql)
97
+ ```
98
+
99
+ ## YAML shape
100
+
101
+ You can use either:
102
+ - one-module-per-file (`module:` root), or
103
+ - a single `semantic_layer.yml` with `modules` and `cross_module_metrics`.
104
+
105
+ ```yaml
106
+ module:
107
+ schema: "analytics"
108
+ table: "records"
109
+ identifier: "records"
110
+ dimensions:
111
+ - column: "country"
112
+ type: "country"
113
+ metrics:
114
+ - identifier: "record_volume"
115
+ name: "Record volume"
116
+ ai_context:
117
+ synonyms:
118
+ - "booked value"
119
+ - "record amount"
120
+ calculation: "sum"
121
+ time: "records.recorded_at"
122
+ value: "records.amount"
123
+ dimensions:
124
+ - "this.*"
125
+ ```
126
+
127
+ Cross-module derived ratio in single-file mode:
128
+
129
+ ```yaml
130
+ semantic_layer:
131
+ modules:
132
+ - identifier: "wide_metric_count_a"
133
+ project: "demo-project-123456"
134
+ schema: "analytics_wide"
135
+ table: "wide_metric_count_a"
136
+ metrics:
137
+ - identifier: "metric_actual_primary"
138
+ name: "Primary Metric, Actual"
139
+ calculation: "sum"
140
+ time: "wide_metric_count_a.fulfilled_at"
141
+ value: "wide_metric_count_a.value_primary_amount"
142
+ filters:
143
+ - column: "settled_at"
144
+ operator: "is-not"
145
+ expression: "NULL"
146
+
147
+ - identifier: "entity_snapshots"
148
+ project: "demo-project-123456"
149
+ schema: "analytics_marts"
150
+ table: "entity_snapshots"
151
+ metrics:
152
+ - identifier: "metric_entities_active"
153
+ name: "# Active Entities"
154
+ calculation: "count-distinct"
155
+ time: "entity_snapshots.date"
156
+ distinct_on: "entity_snapshots.dim_entity_id"
157
+ filters:
158
+ - column: "is_live"
159
+ operator: "is"
160
+ expression: "true"
161
+
162
+ cross_module_metrics:
163
+ - identifier: "ratio_primary_per_entity"
164
+ name: "Primary Metric per Active Entity"
165
+ calculation: "derived-ratio"
166
+ numerator_metric: "metric_actual_primary"
167
+ denominator_metric: "metric_entities_active"
168
+ join_on: "time"
169
+ join_type: "inner" # optional: inner | left | full
170
+ ```
171
+
172
+ API/OpenAPI field mapping (camelCase -> YAML snake_case):
173
+
174
+ - `sqlExpression` -> `sql_expression`
175
+ - `numeratorSql` -> `numerator_sql`
176
+ - `denominatorSql` -> `denominator_sql`
177
+ - `numeratorMetricId` -> `numerator_metric`
178
+ - `denominatorMetricId` -> `denominator_metric`
179
+
180
+ Notes:
181
+ - `derived-ratio` currently supports `join_on: time`.
182
+ - `join_type` supports `inner` (default), `left`, and `full` (`FULL OUTER JOIN`).
183
+ - query request must include `timeGrain` for cross-module metrics (`timeGrain: none` is not supported for cross-module derived ratios).
184
+ - requested breakdowns must be available on both numerator and denominator metrics.
185
+
186
+ ## Publishing
187
+
188
+ Release asset workflow:
189
+
190
+ - `.github/workflows/release-assets.yml`
191
+
192
+ What it does:
193
+ - Runs tests
194
+ - Builds wheel + sdist
195
+ - Uploads artifacts to the GitHub release for `v*` tags
196
+ - On manual dispatch, can attach to a specified tag or create a draft release
197
+
198
+ ## CLI
199
+
200
+ Validate your semantic model:
201
+
202
+ ```bash
203
+ engel-semantic-layer validate --model-path examples/model --json
204
+ ```
205
+
206
+ Compact summary output for CI logs:
207
+
208
+ ```bash
209
+ engel-semantic-layer validate --model-path examples/model --json --summary-only
210
+ ```
211
+
212
+ Validate that all metrics are compilable with synthetic dates (uses `--time-grain monthly` by default so cross-module metrics compile too):
213
+
214
+ ```bash
215
+ engel-semantic-layer validate \
216
+ --model-path examples/model \
217
+ --json \
218
+ --check-compilable
219
+ ```
220
+
221
+ Override synthetic compile time grain when needed:
222
+
223
+ ```bash
224
+ engel-semantic-layer validate \
225
+ --model-path examples/semantic_layer.yml \
226
+ --check-compilable \
227
+ --time-grain weekly
228
+ ```
229
+
230
+ Use period shortcuts instead of explicit from/to:
231
+
232
+ ```bash
233
+ engel-semantic-layer validate \
234
+ --model-path examples/model \
235
+ --check-compilable \
236
+ --period "last 12 months"
237
+ ```
238
+
239
+ Write a full validation report artifact for CI (includes warning summary, compile failures, and per-metric validation statuses):
240
+
241
+ ```bash
242
+ engel-semantic-layer validate \
243
+ --model-path examples/model \
244
+ --check-compilable \
245
+ --report /tmp/semantic-validate-report.json
246
+ ```
247
+
248
+ Fail on semantic warnings:
249
+
250
+ ```bash
251
+ engel-semantic-layer validate \
252
+ --model-path examples/model \
253
+ --warnings-as-errors
254
+ ```
255
+
256
+ Validate with strict column linting for all metrics:
257
+
258
+ ```bash
259
+ engel-semantic-layer validate \
260
+ --model-path examples/model \
261
+ --json \
262
+ --strict-column-lint \
263
+ --column-registry /tmp/registry.json
264
+ ```
265
+
266
+ List metrics:
267
+
268
+ ```bash
269
+ engel-semantic-layer metrics --model-path examples/model
270
+ ```
271
+
272
+ Machine-readable list + filters:
273
+
274
+ ```bash
275
+ engel-semantic-layer metrics \
276
+ --model-path examples/model \
277
+ --format json \
278
+ --module fact_records \
279
+ --calculation sum
280
+ ```
281
+
282
+ Inspect module join graph:
283
+
284
+ ```bash
285
+ engel-semantic-layer joins --model-path examples/model --format json
286
+ ```
287
+
288
+ Compile SQL from a metric + request payload:
289
+
290
+ ```bash
291
+ engel-semantic-layer compile \
292
+ --model-path examples/model \
293
+ --metric-id metric_event_count \
294
+ --request /tmp/request.json \
295
+ --format sql
296
+ ```
297
+
298
+ Compile a cross-module derived ratio from single-file semantic layer:
299
+
300
+ ```bash
301
+ engel-semantic-layer compile \
302
+ --model-path examples/semantic_layer.yml \
303
+ --metric-id ratio_primary_per_entity \
304
+ --request /tmp/request.json \
305
+ --period "last 12 months" \
306
+ --format sql
307
+ ```
308
+
309
+ Get compiler explain output (resolved metadata + source files):
310
+
311
+ ```bash
312
+ engel-semantic-layer compile \
313
+ --model-path examples/model \
314
+ --metric-id metric_event_count \
315
+ --request /tmp/request.json \
316
+ --format explain
317
+ ```
318
+
319
+ Validate request compatibility without returning SQL:
320
+
321
+ ```bash
322
+ engel-semantic-layer compile \
323
+ --model-path examples/model \
324
+ --metric-id metric_event_count \
325
+ --request /tmp/request.json \
326
+ --dry-run-validate
327
+ ```
328
+
329
+ Use period shortcuts in compile:
330
+
331
+ ```bash
332
+ engel-semantic-layer compile \
333
+ --model-path examples/model \
334
+ --metric-id metric_event_distinct_count \
335
+ --request /tmp/request.json \
336
+ --period "Q1 2025"
337
+ ```
338
+
339
+ Supported period formats:
340
+ - `last x years`
341
+ - `last x months`
342
+ - `last x quarters`
343
+ - `last x days`
344
+ - `current year` (YTD)
345
+ - `current month` (MTD)
346
+ - `QX YYYY` (e.g. `Q2 2025`)
347
+ - `YYYY` (e.g. `2025`)
348
+ - `MM-YYYY` (e.g. `02-2025`)
349
+
350
+ Time comparison options in request payload (`timeComparison`):
351
+ - Percentage comparisons: `YoY`, `MoM`
352
+ - Comparison values: `Last Year`, `Last Month`
353
+ - Daily-only percentage comparison: `YoY (Match Weekday)`
354
+
355
+ `YoY (Match Weekday)` requires `timeGrain: daily` and compares to 364 days prior.
356
+ With `timeGrain: none`, comparisons are done as full-window totals (e.g. YTD vs last YTD).
357
+
358
+ Target comparison option in request payload:
359
+ - `targetSeries` (e.g. `budget_current`, `stretch_target`)
360
+
361
+ Period handling options in request payload:
362
+ - `excludeOpenPeriod` (`true|false`)
363
+ - `excludeToday` (`true|false`)
364
+
365
+ When `excludeOpenPeriod` is true, the compiler excludes the currently open period bucket
366
+ for the selected `timeGrain` (e.g. current month for `monthly`, current day for `daily`).
367
+
368
+ When `excludeToday` is true, the compiler clips `toDate` by one day before other period logic.
369
+
370
+ Cumulative options in request payload:
371
+ - `cumulativeMode`: `mtd`, `ytd`, `rolling_days`
372
+ - `rollingDays`: positive integer (required only for `rolling_days`)
373
+
374
+ Validation rules:
375
+ - `mtd` requires `timeGrain: daily`
376
+ - `rolling_days` requires `timeGrain: daily`
377
+ - `ytd` works with daily/weekly/monthly/quarterly/yearly grains
378
+ - cumulative modes require a time grain (not `none`)
379
+
380
+ Output columns for cumulative mode:
381
+ - `metric_base` (original bucket metric)
382
+ - `metric` (cumulative metric)
383
+ - if `targetSeries` is set: `target_value_base` + cumulative `target_value`
384
+
385
+ When `targetSeries` is provided, the compiler joins aggregated targets from
386
+ `demo-project-123456.analytics_prod.fct_targets_monthly` by metric identifier,
387
+ selected date range, and breakdown dimensions (if any), and returns:
388
+ - `target_value`
389
+ - `target_comparison_percentage`
390
+
391
+ Target source can be overridden in CLI with:
392
+ - `--target-project`
393
+ - `--target-schema`
394
+ - `--target-table`
395
+
396
+ With `--strict-column-lint`, target columns used for series/metric/time/value,
397
+ breakdowns, and dimension filters are validated against the column registry.
398
+
399
+ End-to-end request example (comparison + target + cumulative + period clipping):
400
+
401
+ ```json
402
+ {
403
+ "fromDate": "2026-01-01T00:00:00Z",
404
+ "toDate": "2026-12-31T23:59:59Z",
405
+ "timeGrain": "monthly",
406
+ "timeComparison": "YoY",
407
+ "targetSeries": "budget_current",
408
+ "cumulativeMode": "ytd",
409
+ "excludeOpenPeriod": true,
410
+ "excludeToday": true,
411
+ "breakdownDimensionIds": ["segment_group"],
412
+ "filters": [
413
+ {
414
+ "dimensionId": "segment_group",
415
+ "filterValues": ["SegmentA"]
416
+ }
417
+ ]
418
+ }
419
+ ```
420
+
421
+ You can also pipe request JSON through stdin:
422
+
423
+ ```bash
424
+ echo '{"fromDate":"2026-01-01T00:00:00Z","toDate":"2026-01-31T23:59:59Z"}' | \
425
+ engel-semantic-layer compile \
426
+ --model-path examples/model \
427
+ --metric-id metric_event_distinct_count \
428
+ --request -
429
+ ```
430
+
431
+ Compile with strict column linting:
432
+
433
+ ```bash
434
+ engel-semantic-layer compile \
435
+ --model-path examples/model \
436
+ --metric-id metric_actual_secondary \
437
+ --request /tmp/request.json \
438
+ --strict-column-lint \
439
+ --column-registry /tmp/registry.json
440
+ ```
441
+
442
+ ## Optional strict column linting
443
+
444
+ You can enforce table+column existence checks during compile by providing a column registry.
445
+
446
+ ```python
447
+ from engel_semantic_layer import ColumnRegistry, QueryRequest, SemanticLayer
448
+
449
+ registry = ColumnRegistry.from_dict(
450
+ {
451
+ "analytics_wide": {
452
+ "wide_metric_count_a": ["recorded_at", "fulfilled_at", "value_secondary_amount", "status"]
453
+ }
454
+ }
455
+ )
456
+
457
+ layer = SemanticLayer.from_path(
458
+ "examples/model",
459
+ strict_column_lint=True,
460
+ column_registry=registry,
461
+ )
462
+
463
+ sql = layer.compile_sql(
464
+ "metric_actual_secondary",
465
+ QueryRequest(from_date="2026-01-01T00:00:00Z", to_date="2026-01-31T23:59:59Z"),
466
+ )
467
+ ```
468
+
469
+ You can also construct the registry from `INFORMATION_SCHEMA.COLUMNS` rows using
470
+ `ColumnRegistry.from_information_schema_rows(...)`.
471
+
472
+ CLI helper:
473
+
474
+ ```bash
475
+ engel-semantic-layer registry from-information-schema \
476
+ --input /tmp/information_schema_rows.json \
477
+ --output /tmp/registry.json
478
+ ```
479
+
480
+ Input supports both:
481
+ - JSON array (`[ {...}, {...} ]`)
482
+ - JSONL (one JSON object per line)
483
+
484
+ ## Testing
485
+
486
+ The test suite includes SQL snapshot tests for representative metrics. Run:
487
+
488
+ ```bash
489
+ uv run --extra dev pytest -q
490
+ ```
491
+
492
+ If SQL output intentionally changes, update snapshot files in `tests/snapshots/`.
493
+
494
+ CLI exit codes:
495
+ - `0` success
496
+ - `2` validation/parse/compile input error
497
+
498
+ For machine-readable failures, use `--json-errors`:
499
+
500
+ ```bash
501
+ engel-semantic-layer --json-errors compile ...
502
+ ```
503
+
504
+ ## Notes
505
+
506
+ - This compiler focuses on BigQuery SQL generation.
507
+ - Output aims to be Engel-like SQL, but this is an independent implementation.
508
+ - Strict validation is enabled for basics: duplicate IDs, unsupported operators/calculations, invalid metric field combinations, and disallowed breakdown/filter dimensions.
509
+ - Custom SQL metric expressions are normalized for BigQuery string literals (double-quoted strings are converted to single-quoted literals).
510
+ - Custom SQL reference linting is enabled: dotted references in custom expressions are validated and unknown table references fail at compile time.
511
+ - Query compilation guards against invalid breakdown requests (max 2 dimensions, no duplicates, no alias collisions).
512
+ - Missing join-path errors are actionable and include metric + module context plus reference sources (which fields caused the dependency).