PyPI - engel-semantic-layer - Versions diffs - 0.2.1__tar.gz - Mend

engel-semantic-layer 0.2.1__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (21) hide show

engel_semantic_layer-0.2.1/LICENSE ADDED Viewed

@@ -0,0 +1,21 @@
+MIT License
+Copyright (c) 2026 Rasmus Engelbrecht
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.

engel_semantic_layer-0.2.1/PKG-INFO ADDED Viewed

@@ -0,0 +1,512 @@
+Metadata-Version: 2.4
+Name: engel-semantic-layer
+Version: 0.2.1
+Summary: Engel-style semantic layer compiler for BigQuery
+Author: Rasmus Engelbrecht
+License-Expression: MIT
+Project-URL: Homepage, https://github.com/rasmusengelbrecht/engel-semantic-layer-public
+Project-URL: Repository, https://github.com/rasmusengelbrecht/engel-semantic-layer-public
+Project-URL: Issues, https://github.com/rasmusengelbrecht/engel-semantic-layer-public/issues
+Keywords: semantic-layer,metrics,sql,bigquery,yaml
+Classifier: Development Status :: 3 - Alpha
+Classifier: Intended Audience :: Developers
+Classifier: Programming Language :: Python :: 3
+Classifier: Programming Language :: Python :: 3 :: Only
+Classifier: Programming Language :: Python :: 3.10
+Classifier: Programming Language :: Python :: 3.11
+Classifier: Programming Language :: Python :: 3.12
+Classifier: Topic :: Database
+Classifier: Topic :: Software Development :: Libraries
+Requires-Python: >=3.10
+Description-Content-Type: text/markdown
+License-File: LICENSE
+Requires-Dist: PyYAML>=6.0
+Provides-Extra: dev
+Requires-Dist: pytest>=8.0; extra == "dev"
+Dynamic: license-file
+# engel-semantic-layer
+Engel-style semantic layer as a Python package.
+This package lets users define metrics/modules in YAML (same shape as Engel's code reference) and compile executable BigQuery SQL for a metric query request.
+## Scope
+- YAML loading from:
+  - one-module-per-file (`module:` root), or
+  - single-file semantic layer (`semantic_layer.modules` + `semantic_layer.cross_module_metrics`)
+- Metric types:
+  - `sum`
+  - `count`
+  - `ratio`
+  - `count-distinct`
+  - `custom-value`
+  - `custom-ratio`
+  - `derived-ratio` (cross-module, metric-on-metric, `join_on: time`)
+- Metric filters + query-time filters
+- Metric slices (`slice_name`) for base metrics
+- Time grain support: none, daily, weekly, monthly, quarterly, yearly
+- Breakdowns with dimension access checks
+- Join path traversal across modules
+- BigQuery SQL output
+## Changelog
+See [CHANGELOG.md](CHANGELOG.md) for release notes.
+## Installation
+Install from a private GitHub release wheel asset:
+```bash
+pip install \
+  "https://github.com/<owner>/engel-semantic-layer/releases/download/v0.2.0/engel_semantic_layer-0.2.0-py3-none-any.whl"
+```
+For private repos, use authenticated download (tokened URL or API download flow).
+Install from source for local development:
+```bash
+pip install -e .
+```
+## Example
+```python
+from engel_semantic_layer import QueryFilter, QueryRequest, SemanticLayer
+layer = SemanticLayer.from_path("examples/model")
+request = QueryRequest(
+    from_date="2026-01-01T00:00:00Z",
+    to_date="2026-01-31T23:59:59Z",
+    time_grain="daily",
+    breakdown_dimension_ids=["user_email"],
+    filters=[
+        QueryFilter(
+            dimension_id="user_email",
+            filter_values=["demo-user@example.com"],
+        )
+    ],
+)
+sql = layer.compile_sql(metric_id="metric_event_count", request=request)
+print(sql)
+```
+## YAML shape
+You can use either:
+- one-module-per-file (`module:` root), or
+- a single `semantic_layer.yml` with `modules` and `cross_module_metrics`.
+```yaml
+module:
+  schema: "analytics"
+  table: "records"
+  identifier: "records"
+  dimensions:
+    - column: "country"
+      type: "country"
+  metrics:
+    - identifier: "record_volume"
+      name: "Record volume"
+      ai_context:
+        synonyms:
+          - "booked value"
+          - "record amount"
+      calculation: "sum"
+      time: "records.recorded_at"
+      value: "records.amount"
+      dimensions:
+        - "this.*"
+```
+Cross-module derived ratio in single-file mode:
+```yaml
+semantic_layer:
+  modules:
+    - identifier: "wide_metric_count_a"
+      project: "demo-project-123456"
+      schema: "analytics_wide"
+      table: "wide_metric_count_a"
+      metrics:
+        - identifier: "metric_actual_primary"
+          name: "Primary Metric, Actual"
+          calculation: "sum"
+          time: "wide_metric_count_a.fulfilled_at"
+          value: "wide_metric_count_a.value_primary_amount"
+          filters:
+            - column: "settled_at"
+              operator: "is-not"
+              expression: "NULL"
+    - identifier: "entity_snapshots"
+      project: "demo-project-123456"
+      schema: "analytics_marts"
+      table: "entity_snapshots"
+      metrics:
+        - identifier: "metric_entities_active"
+          name: "# Active Entities"
+          calculation: "count-distinct"
+          time: "entity_snapshots.date"
+          distinct_on: "entity_snapshots.dim_entity_id"
+          filters:
+            - column: "is_live"
+              operator: "is"
+              expression: "true"
+  cross_module_metrics:
+    - identifier: "ratio_primary_per_entity"
+      name: "Primary Metric per Active Entity"
+      calculation: "derived-ratio"
+      numerator_metric: "metric_actual_primary"
+      denominator_metric: "metric_entities_active"
+      join_on: "time"
+      join_type: "inner" # optional: inner | left | full
+```
+API/OpenAPI field mapping (camelCase -> YAML snake_case):
+- `sqlExpression` -> `sql_expression`
+- `numeratorSql` -> `numerator_sql`
+- `denominatorSql` -> `denominator_sql`
+- `numeratorMetricId` -> `numerator_metric`
+- `denominatorMetricId` -> `denominator_metric`
+Notes:
+- `derived-ratio` currently supports `join_on: time`.
+- `join_type` supports `inner` (default), `left`, and `full` (`FULL OUTER JOIN`).
+- query request must include `timeGrain` for cross-module metrics (`timeGrain: none` is not supported for cross-module derived ratios).
+- requested breakdowns must be available on both numerator and denominator metrics.
+## Publishing
+Release asset workflow:
+- `.github/workflows/release-assets.yml`
+What it does:
+- Runs tests
+- Builds wheel + sdist
+- Uploads artifacts to the GitHub release for `v*` tags
+- On manual dispatch, can attach to a specified tag or create a draft release
+## CLI
+Validate your semantic model:
+```bash
+engel-semantic-layer validate --model-path examples/model --json
+```
+Compact summary output for CI logs:
+```bash
+engel-semantic-layer validate --model-path examples/model --json --summary-only
+```
+Validate that all metrics are compilable with synthetic dates (uses `--time-grain monthly` by default so cross-module metrics compile too):
+```bash
+engel-semantic-layer validate \
+  --model-path examples/model \
+  --json \
+  --check-compilable
+```
+Override synthetic compile time grain when needed:
+```bash
+engel-semantic-layer validate \
+  --model-path examples/semantic_layer.yml \
+  --check-compilable \
+  --time-grain weekly
+```
+Use period shortcuts instead of explicit from/to:
+```bash
+engel-semantic-layer validate \
+  --model-path examples/model \
+  --check-compilable \
+  --period "last 12 months"
+```
+Write a full validation report artifact for CI (includes warning summary, compile failures, and per-metric validation statuses):
+```bash
+engel-semantic-layer validate \
+  --model-path examples/model \
+  --check-compilable \
+  --report /tmp/semantic-validate-report.json
+```
+Fail on semantic warnings:
+```bash
+engel-semantic-layer validate \
+  --model-path examples/model \
+  --warnings-as-errors
+```
+Validate with strict column linting for all metrics:
+```bash
+engel-semantic-layer validate \
+  --model-path examples/model \
+  --json \
+  --strict-column-lint \
+  --column-registry /tmp/registry.json
+```
+List metrics:
+```bash
+engel-semantic-layer metrics --model-path examples/model
+```
+Machine-readable list + filters:
+```bash
+engel-semantic-layer metrics \
+  --model-path examples/model \
+  --format json \
+  --module fact_records \
+  --calculation sum
+```
+Inspect module join graph:
+```bash
+engel-semantic-layer joins --model-path examples/model --format json
+```
+Compile SQL from a metric + request payload:
+```bash
+engel-semantic-layer compile \
+  --model-path examples/model \
+  --metric-id metric_event_count \
+  --request /tmp/request.json \
+  --format sql
+```
+Compile a cross-module derived ratio from single-file semantic layer:
+```bash
+engel-semantic-layer compile \
+  --model-path examples/semantic_layer.yml \
+  --metric-id ratio_primary_per_entity \
+  --request /tmp/request.json \
+  --period "last 12 months" \
+  --format sql
+```
+Get compiler explain output (resolved metadata + source files):
+```bash
+engel-semantic-layer compile \
+  --model-path examples/model \
+  --metric-id metric_event_count \
+  --request /tmp/request.json \
+  --format explain
+```
+Validate request compatibility without returning SQL:
+```bash
+engel-semantic-layer compile \
+  --model-path examples/model \
+  --metric-id metric_event_count \
+  --request /tmp/request.json \
+  --dry-run-validate
+```
+Use period shortcuts in compile:
+```bash
+engel-semantic-layer compile \
+  --model-path examples/model \
+  --metric-id metric_event_distinct_count \
+  --request /tmp/request.json \
+  --period "Q1 2025"
+```
+Supported period formats:
+- `last x years`
+- `last x months`
+- `last x quarters`
+- `last x days`
+- `current year` (YTD)
+- `current month` (MTD)
+- `QX YYYY` (e.g. `Q2 2025`)
+- `YYYY` (e.g. `2025`)
+- `MM-YYYY` (e.g. `02-2025`)
+Time comparison options in request payload (`timeComparison`):
+- Percentage comparisons: `YoY`, `MoM`
+- Comparison values: `Last Year`, `Last Month`
+- Daily-only percentage comparison: `YoY (Match Weekday)`
+`YoY (Match Weekday)` requires `timeGrain: daily` and compares to 364 days prior.
+With `timeGrain: none`, comparisons are done as full-window totals (e.g. YTD vs last YTD).
+Target comparison option in request payload:
+- `targetSeries` (e.g. `budget_current`, `stretch_target`)
+Period handling options in request payload:
+- `excludeOpenPeriod` (`true|false`)
+- `excludeToday` (`true|false`)
+When `excludeOpenPeriod` is true, the compiler excludes the currently open period bucket
+for the selected `timeGrain` (e.g. current month for `monthly`, current day for `daily`).
+When `excludeToday` is true, the compiler clips `toDate` by one day before other period logic.
+Cumulative options in request payload:
+- `cumulativeMode`: `mtd`, `ytd`, `rolling_days`
+- `rollingDays`: positive integer (required only for `rolling_days`)
+Validation rules:
+- `mtd` requires `timeGrain: daily`
+- `rolling_days` requires `timeGrain: daily`
+- `ytd` works with daily/weekly/monthly/quarterly/yearly grains
+- cumulative modes require a time grain (not `none`)
+Output columns for cumulative mode:
+- `metric_base` (original bucket metric)
+- `metric` (cumulative metric)
+- if `targetSeries` is set: `target_value_base` + cumulative `target_value`
+When `targetSeries` is provided, the compiler joins aggregated targets from
+`demo-project-123456.analytics_prod.fct_targets_monthly` by metric identifier,
+selected date range, and breakdown dimensions (if any), and returns:
+- `target_value`
+- `target_comparison_percentage`
+Target source can be overridden in CLI with:
+- `--target-project`
+- `--target-schema`
+- `--target-table`
+With `--strict-column-lint`, target columns used for series/metric/time/value,
+breakdowns, and dimension filters are validated against the column registry.
+End-to-end request example (comparison + target + cumulative + period clipping):
+```json
+{
+  "fromDate": "2026-01-01T00:00:00Z",
+  "toDate": "2026-12-31T23:59:59Z",
+  "timeGrain": "monthly",
+  "timeComparison": "YoY",
+  "targetSeries": "budget_current",
+  "cumulativeMode": "ytd",
+  "excludeOpenPeriod": true,
+  "excludeToday": true,
+  "breakdownDimensionIds": ["segment_group"],
+  "filters": [
+    {
+      "dimensionId": "segment_group",
+      "filterValues": ["SegmentA"]
+    }
+  ]
+}
+```
+You can also pipe request JSON through stdin:
+```bash
+echo '{"fromDate":"2026-01-01T00:00:00Z","toDate":"2026-01-31T23:59:59Z"}' | \
+  engel-semantic-layer compile \
+    --model-path examples/model \
+    --metric-id metric_event_distinct_count \
+    --request -
+```
+Compile with strict column linting:
+```bash
+engel-semantic-layer compile \
+  --model-path examples/model \
+  --metric-id metric_actual_secondary \
+  --request /tmp/request.json \
+  --strict-column-lint \
+  --column-registry /tmp/registry.json
+```
+## Optional strict column linting
+You can enforce table+column existence checks during compile by providing a column registry.
+```python
+from engel_semantic_layer import ColumnRegistry, QueryRequest, SemanticLayer
+registry = ColumnRegistry.from_dict(
+    {
+        "analytics_wide": {
+            "wide_metric_count_a": ["recorded_at", "fulfilled_at", "value_secondary_amount", "status"]
+        }
+    }
+)
+layer = SemanticLayer.from_path(
+    "examples/model",
+    strict_column_lint=True,
+    column_registry=registry,
+)
+sql = layer.compile_sql(
+    "metric_actual_secondary",
+    QueryRequest(from_date="2026-01-01T00:00:00Z", to_date="2026-01-31T23:59:59Z"),
+)
+```
+You can also construct the registry from `INFORMATION_SCHEMA.COLUMNS` rows using
+`ColumnRegistry.from_information_schema_rows(...)`.
+CLI helper:
+```bash
+engel-semantic-layer registry from-information-schema \
+  --input /tmp/information_schema_rows.json \
+  --output /tmp/registry.json
+```
+Input supports both:
+- JSON array (`[ {...}, {...} ]`)
+- JSONL (one JSON object per line)
+## Testing
+The test suite includes SQL snapshot tests for representative metrics. Run:
+```bash
+uv run --extra dev pytest -q
+```
+If SQL output intentionally changes, update snapshot files in `tests/snapshots/`.
+CLI exit codes:
+- `0` success
+- `2` validation/parse/compile input error
+For machine-readable failures, use `--json-errors`:
+```bash
+engel-semantic-layer --json-errors compile ...
+```
+## Notes
+- This compiler focuses on BigQuery SQL generation.
+- Output aims to be Engel-like SQL, but this is an independent implementation.
+- Strict validation is enabled for basics: duplicate IDs, unsupported operators/calculations, invalid metric field combinations, and disallowed breakdown/filter dimensions.
+- Custom SQL metric expressions are normalized for BigQuery string literals (double-quoted strings are converted to single-quoted literals).
+- Custom SQL reference linting is enabled: dotted references in custom expressions are validated and unknown table references fail at compile time.
+- Query compilation guards against invalid breakdown requests (max 2 dimensions, no duplicates, no alias collisions).
+- Missing join-path errors are actionable and include metric + module context plus reference sources (which fields caused the dependency).