PyPI - semantic-query-compiler - Versions diffs - 0.1.0__tar.gz - Mend

semantic-query-compiler 0.1.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (35) hide show

semantic_query_compiler-0.1.0/.github/workflows/ci.yml ADDED Viewed

@@ -0,0 +1,26 @@
+name: ci
+on:
+  pull_request:
+  push:
+    branches: [master, main]
+concurrency:
+  group: ${{ github.workflow }}-${{ github.ref }}
+  cancel-in-progress: true
+jobs:
+  test:
+    runs-on: ubuntu-latest
+    steps:
+      - uses: actions/checkout@v5
+      - name: Install uv
+        uses: astral-sh/setup-uv@v7
+      - name: Set up Python
+        run: uv python install 3.12
+      - name: Install dependencies
+        run: uv sync --extra dev
+      - name: Lint
+        run: uv run ruff check .
+      - name: Test
+        run: uv run pytest

semantic_query_compiler-0.1.0/.gitignore ADDED Viewed

@@ -0,0 +1,8 @@
+.venv/
+__pycache__/
+*.py[cod]
+.pytest_cache/
+.ruff_cache/
+build/
+dist/
+*.egg-info/

semantic_query_compiler-0.1.0/CHANGELOG.md ADDED Viewed

@@ -0,0 +1,32 @@
+# Changelog
+## 0.1.0 - beta
+Initial beta release of `semantic-query-compiler`.
+### Included
+- YAML semantic model loader with `semantic`, `model`, `semantic_layer`, `modules`, and single `module` roots.
+- CLI commands for model inspection, validation, SQL compilation, result comparison, and column-registry generation.
+- BigQuery, DuckDB, Postgres, and Snowflake SQL rendering.
+- Aggregate, count, count-distinct, ratio, custom-value, custom-ratio, and cross-module derived-ratio metrics.
+- Metric filters, query filters, named slices, breakdowns, joined breakdowns, and safe multi-hop joins over `many-to-one` / `one-to-one` relationships.
+- Period shortcuts including closed trailing periods such as `last 24 complete months`.
+- Time comparison, cumulative windows, targets, target comparisons, and ratio target handling.
+- Custom SQL parsing/transpilation with optional function allowlists.
+- Column registry generation from `information_schema` JSON/JSONL and dbt `manifest.json`.
+- Result comparison harness for DuckDB and BigQuery.
+- Fresh-install DuckDB quickstart example.
+### Support level
+- BigQuery: execution-tested and used as the primary warehouse validation target.
+- DuckDB: execution-tested, including copied real-model data checks across all metrics and key request shapes.
+- Postgres and Snowflake: compile/parse-tested; execution validation is not part of this beta.
+### Known limits
+- This is a beta, not a production-trust/v1 release.
+- Dashboard/export parity against trusted business reports is still required before relying on a migrated model for production decisions.
+- Postgres and Snowflake support should be treated as SQL rendering support until execution tests are added.
+- The compiler owns SQL generation and comparison helpers; warehouse execution is intentionally left to caller tooling.

semantic_query_compiler-0.1.0/PKG-INFO ADDED Viewed

@@ -0,0 +1,565 @@
+Metadata-Version: 2.4
+Name: semantic-query-compiler
+Version: 0.1.0
+Summary: Agent-friendly semantic metric compiler with dialect-aware SQL rendering
+Project-URL: Homepage, https://github.com/rasmusengelbrecht/semantic-query-compiler
+Project-URL: Repository, https://github.com/rasmusengelbrecht/semantic-query-compiler
+Project-URL: Issues, https://github.com/rasmusengelbrecht/semantic-query-compiler/issues
+Author: Rasmus Engelbrecht
+License-Expression: MIT
+Keywords: analytics,bigquery,duckdb,metrics,semantic-layer,sql
+Classifier: Development Status :: 3 - Alpha
+Classifier: Environment :: Console
+Classifier: Intended Audience :: Developers
+Classifier: Intended Audience :: Information Technology
+Classifier: License :: OSI Approved :: MIT License
+Classifier: Programming Language :: Python :: 3
+Classifier: Programming Language :: Python :: 3.11
+Classifier: Programming Language :: Python :: 3.12
+Classifier: Topic :: Database
+Classifier: Topic :: Software Development :: Code Generators
+Requires-Python: >=3.11
+Requires-Dist: duckdb>=1.1
+Requires-Dist: pydantic<2.12,>=2.7
+Requires-Dist: pyyaml>=6.0
+Requires-Dist: sqlglot>=30.0
+Provides-Extra: bigquery
+Requires-Dist: google-cloud-bigquery>=3.0; extra == 'bigquery'
+Provides-Extra: dev
+Requires-Dist: pytest>=8.0; extra == 'dev'
+Requires-Dist: ruff>=0.8; extra == 'dev'
+Description-Content-Type: text/markdown
+# semantic-query-compiler
+Agent-friendly semantic metric compiler with dialect-aware SQL rendering.
+`semantic-query-compiler` turns governed YAML metric definitions into inspectable SQL for warehouses such as BigQuery, DuckDB, Postgres, and Snowflake. It is designed for CLI and agent workflows: discover metrics, validate a model, compile SQL, explain the query plan, and compare results against trusted SQL when you need confidence.
+The compiler is intentionally separate from warehouse execution. It owns semantic definitions, validation, planning, SQL rendering, and comparison helpers. Run compiled SQL with your warehouse client, BI tool, notebook, dbt workflow, or agent runtime.
+Current status: `0.1.0` beta. BigQuery and DuckDB have execution coverage; Postgres and Snowflake are compile/parse-tested. See `docs/support-matrix.md` for support levels and v1 release gates.
+## Install
+From a checkout:
+```bash
+uv venv
+uv pip install -e '.[dev]'
+semantic --help
+```
+From GitHub:
+```bash
+uv tool install 'semantic-query-compiler @ git+https://github.com/rasmusengelbrecht/semantic-query-compiler.git'
+```
+Optional BigQuery comparison support needs the BigQuery extra or dependency:
+```bash
+uv pip install -e '.[bigquery]'
+```
+## Quickstart
+Create starter files:
+```bash
+semantic init --dir demo-semantic --with-duckdb-example
+cd demo-semantic
+semantic validate --model semantic.yml --check-compilable
+semantic compile revenue --model semantic.yml --period "current month" --request request.json --format sql
+semantic compare-metric revenue \
+  --model semantic.yml \
+  --period "current month" \
+  --request request.json \
+  --reference reference_revenue.sql \
+  --dialect duckdb \
+  --engine duckdb \
+  --setup setup.sql
+```
+A runnable DuckDB example lives in `examples/quickstart/`.
+## Define a semantic model
+Create a YAML file, for example `semantic.yml`. The recommended root key is `semantic`; `model` and `semantic_layer` are also accepted root keys.
+```yaml
+semantic:
+  version: 1
+  modules:
+    - identifier: bookings
+      project: demo-project
+      schema: analytics
+      table: bookings
+      dimensions:
+        - column: country
+          type: categorical
+        - column: channel
+          type: categorical
+      metrics:
+        - identifier: booking_count
+          name: Booking Count
+          calculation: count
+          time: bookings.created_at
+          filters:
+            - column: state
+              operator: in
+              expression: confirmed,completed,cancelled
+          slices:
+            - name: confirmed_only
+              filter:
+                column: state
+                operator: equals
+                expression: confirmed
+          dimensions:
+            - country
+            - channel
+        - identifier: revenue
+          name: Revenue
+          calculation: sum
+          time: bookings.created_at
+          value: bookings.revenue
+          filters:
+            - column: state
+              operator: in
+              expression: confirmed,completed,cancelled
+          dimensions:
+            - country
+            - channel
+        - identifier: average_daily_rate
+          name: Average Daily Rate
+          calculation: ratio
+          time: bookings.created_at
+          numerator: bookings.gross_booking_value
+          denominator: bookings.nights
+          dimensions:
+            - country
+        - identifier: request_success_rate
+          name: Request Success Rate
+          calculation: custom-ratio
+          time: bookings.created_at
+          numerator_sql: COUNT(CASE WHEN requested_at IS NOT NULL AND state IN ('confirmed', 'completed') THEN id END)
+          denominator_sql: COUNT(CASE WHEN requested_at IS NOT NULL THEN id END)
+          allowed_functions: [COUNT, IF]
+          dimensions:
+            - country
+    - identifier: ad_spend
+      project: demo-project
+      schema: analytics
+      table: marketing_costs
+      dimensions:
+        - column: country
+          type: categorical
+      metrics:
+        - identifier: marketing_spend
+          name: Marketing Spend
+          calculation: sum
+          time: marketing_costs.date
+          value: marketing_costs.cost
+          dimensions:
+            - country
+  cross_module_metrics:
+    - identifier: roas
+      name: ROAS
+      calculation: derived-ratio
+      numerator_metric: revenue
+      denominator_metric: marketing_spend
+      join_on: time
+      join_type: inner
+      dimensions:
+        - country
+```
+Supported metric calculations:
+- `sum`
+- `count`
+- `count-distinct`
+- `ratio` as safe `SUM(numerator) / SUM(denominator)`
+- `custom-value`
+- `custom-ratio`
+- cross-module `derived-ratio`
+Supported metric and slice filter operators:
+- `equals`, `not-equals`
+- `is`, `is-not`
+- `in`, `not-in`
+- `like`, `not-like`
+- `greater-than`, `greater-than-or-equal`
+- `less-than`, `less-than-or-equal`
+- `null`, `not-null`, `is-null`, `is-not-null`
+## Create a request shape
+For CLI use, prefer `--period` for the date window and keep the request file focused on query shape:
+```json
+{
+  "timeGrain": "monthly",
+  "breakdownDimensionIds": ["country"],
+  "filters": [
+    { "dimensionId": "country", "filterValues": ["DK"] }
+  ]
+}
+```
+Then compile with a named period:
+```bash
+semantic compile revenue \
+  --model semantic.yml \
+  --period "last 24 complete months" \
+  --request request.json \
+  --dialect bigquery
+```
+Supported request features:
+- `timeGrain`: `none`, `daily`, `weekly`, `monthly`, `quarterly`, `yearly`
+- `breakdownDimensionIds`
+- `sliceName` for named metric slice filters
+- query filters
+- `timeComparison`: `YoY`, `MoM`, `Last Year`, `Last Month`, `YoY Match Weekday`
+- `cumulativeMode`: `ytd`, `mtd`, `rolling_days`
+- `targetSeries` with explicit target table mapping
+- `excludeOpenPeriod` and `excludeToday` period clipping
+For API use, or when you need exact boundaries, `fromDate` and `toDate` can still be provided directly in the request JSON.
+Supported period shortcuts:
+- `current year` (year-to-date)
+- `current month` (month-to-date)
+- `last N days|weeks|months|quarters|years`
+- `last N complete days|weeks|months|quarters|years`
+- `Q1 2026`
+- `2026`
+- `01-2026`
+For trailing closed-period reporting, prefer `complete` periods. Example: `--period "last 24 complete months"` returns 24 closed monthly buckets. For current-period-to-date reporting, combine a relative period with `excludeOpenPeriod` in the request. Example: current-year YTD through the last complete month:
+```json
+{
+  "timeGrain": "monthly",
+  "cumulativeMode": "ytd",
+  "timeComparison": "YoY",
+  "targetSeries": "budget_current",
+  "excludeOpenPeriod": true
+}
+```
+## Compile SQL
+```bash
+semantic compile revenue \
+  --model semantic.yml \
+  --period "last 24 complete months" \
+  --request request.json \
+  --dialect bigquery \
+  --format sql
+```
+Explain the generated plan:
+```bash
+semantic compile revenue \
+  --model semantic.yml \
+  --period "current year" \
+  --request request.json \
+  --dialect bigquery \
+  --format explain
+```
+Write SQL to a file:
+```bash
+semantic compile revenue \
+  --model semantic.yml \
+  --period "current year" \
+  --request request.json \
+  --dialect bigquery \
+  --output query.sql
+```
+## Inspect and validate a model
+```bash
+semantic init --dir demo-semantic --with-duckdb-example
+semantic metrics --model semantic.yml --format json
+semantic search-metrics "gross booking value" --model semantic.yml --format json
+semantic describe revenue --model semantic.yml --format json
+semantic joins --model semantic.yml --format json
+```
+`metrics`, `search-metrics`, and `describe` include discovery metadata such as descriptions, `ai_context` synonyms, teams, generic metadata, and relationship/formula notes when present in the model.
+Run validation:
+```bash
+semantic validate \
+  --model semantic.yml \
+  --check-compilable \
+  --format json
+```
+Optional: for stronger governance, validate against a column registry. This catches model drift such as missing tables/columns, stale warehouse metadata, and unsafe joins when uniqueness metadata is available:
+```bash
+semantic validate \
+  --model semantic.yml \
+  --column-registry column_registry.json \
+  --max-registry-age-hours 24 \
+  --require-join-uniqueness \
+  --check-compilable \
+  --format json
+```
+A registry can be generated from warehouse `information_schema` exports or a dbt `manifest.json` when you have those available:
+```bash
+semantic registry-from-information-schema \
+  --input information_schema_columns.jsonl \
+  --warehouse warehouse-prod \
+  --output column_registry.json
+semantic registry-from-dbt-manifest \
+  --input target/manifest.json \
+  --warehouse warehouse-prod \
+  --output column_registry.json
+```
+## Targets
+Targets are optional benchmark/budget/goal values that can be joined onto metric results. They live outside the semantic model in a normal warehouse table. The compiler only needs to know which table to read and how its columns map to the generic target contract.
+A simple unsegmented target table has one row per metric, target series, and time period:
+```sql
+CREATE TABLE analytics.metric_targets (
+  metric_id TEXT,
+  target_series TEXT,
+  metric_time DATE,
+  target_value DOUBLE
+);
+```
+Example rows:
+| metric_id | target_series | metric_time | target_value |
+| --- | --- | --- | ---: |
+| revenue | budget_current | 2026-01-01 | 125000 |
+| revenue | budget_current | 2026-02-01 | 130000 |
+| booking_count | budget_current | 2026-01-01 | 420 |
+Then request a target series:
+```json
+{
+  "timeGrain": "monthly",
+  "targetSeries": "budget_current"
+}
+```
+Compile with the target table:
+```bash
+semantic compile revenue \
+  --model semantic.yml \
+  --period "current year" \
+  --request target-request.json \
+  --target-table warehouse.analytics.metric_targets \
+  --format sql
+```
+The default target column contract is:
+| Purpose | Default column |
+| --- | --- |
+| Metric identifier | `metric_id` |
+| Target series / version | `target_series` |
+| Target period start | `metric_time` |
+| Target value | `target_value` |
+| Breakdown dimensions | one column per grouped breakdown alias |
+If your table uses different names, map them at compile time:
+```bash
+semantic compile revenue \
+  --model semantic.yml \
+  --period "current year" \
+  --request target-request.json \
+  --target-table warehouse.analytics.metric_targets \
+  --target-metric-column metric \
+  --target-series-column series \
+  --target-time-column time \
+  --target-value-column value \
+  --format sql
+```
+For breakdown targets, add one target column per breakdown dimension. The grain becomes one row per metric, target series, time period, and target dimension combination. For example, a monthly `revenue` target broken down by `country` and `channel` should have one row for each `(metric_id, target_series, metric_time, country, channel)` combination.
+The target column should use the same name as the breakdown alias, or the dimension can set `target_column` in the semantic model:
+```yaml
+- column: country
+  type: categorical
+  target_column: market
+```
+Example segmented target table:
+```sql
+CREATE TABLE analytics.metric_targets (
+  metric_id TEXT,
+  target_series TEXT,
+  metric_time DATE,
+  market TEXT,
+  channel TEXT,
+  target_value DOUBLE
+);
+```
+Example segmented rows:
+| metric_id | target_series | metric_time | market | channel | target_value |
+| --- | --- | --- | --- | --- | ---: |
+| revenue | budget_current | 2026-01-01 | DK | paid_search | 45000 |
+| revenue | budget_current | 2026-01-01 | DK | organic | 30000 |
+| revenue | budget_current | 2026-01-01 | SE | paid_search | 25000 |
+| revenue | budget_current | 2026-01-01 | SE | organic | 25000 |
+When compiling `breakdownDimensionIds: ["country", "channel"]`, the compiler joins target rows on `metric_id`, `target_series`, `metric_time`, `market`, and `channel`, then returns `target_value` and `target_comparison_percentage`. If the request does not include a breakdown, matching target rows are summed to the requested time grain.
+If one physical target table stores multiple overlapping grains, pass each possible target dimension column with `--target-dimension-column`:
+```bash
+semantic compile revenue \
+  --model semantic.yml \
+  --period "current year" \
+  --request country-request.json \
+  --target-table warehouse.analytics.metric_targets \
+  --target-dimension-column market \
+  --target-dimension-column channel \
+  --format sql
+```
+The compiler then picks the lowest matching target grain for the requested breakdown before summing targets. This prevents duplicate totals when a table contains both total rows and segmented rows. Use `--target-null-column` only when you want to force a fixed-grain target by requiring a target dimension column to be `NULL`.
+Targets can be combined with time comparison and cumulative modes. Target columns describe current-period target comparison; historical comparison columns describe historical metric comparison.
+Ratio target semantics:
+- additive metrics sum matching target rows
+- same-module `ratio` metrics try to find matching numerator and denominator `sum` metrics with the same time/filter shape, then compute `SUM(numerator target) / SUM(denominator target)`
+- `custom-ratio` metrics use the direct target metric aggregate (`SUM(value)`) because their numerator/denominator are arbitrary SQL snippets, not reusable metric IDs
+- cross-module `derived-ratio` metrics compute `SUM(numerator metric target) / SUM(denominator metric target)`
+- these component target semantics also apply when targets are combined with `timeComparison` or `cumulativeMode`
+That means a direct target row for the ratio metric itself is not used when component target metrics are available. If a component target is missing, the ratio target is `NULL`.
+Request filters are applied to target rows too. The compiler maps filter dimensions through `target_column` metadata before filtering the target table.
+## Custom SQL governance
+Custom metrics can use SQL snippets:
+```yaml
+- identifier: postgres_custom
+  name: Postgres Custom
+  calculation: custom-value
+  time: bookings.created_at
+  custom_sql_dialect: postgres
+  allowed_functions: [SUM, CAST]
+  sql_expression: "SUM(gross_booking_value::INT)"
+```
+Behavior:
+- snippets are parsed with SQLGlot
+- `custom_sql_dialect` parses from a source dialect and renders into the requested target dialect
+- bare columns are qualified with the base table alias
+- `allowed_functions` is optional; when present, functions outside the allowlist fail compilation
+SQLGlot models `CASE WHEN ... THEN ... END` branches as `IF`, so include `IF` in `allowed_functions` for governed CASE expressions.
+## Compare results
+Compare compiled SQL for one metric against reference SQL:
+```bash
+semantic compare-metric revenue \
+  --model semantic.yml \
+  --request request.json \
+  --reference reference.sql \
+  --engine bigquery \
+  --project my-gcp-project \
+  --location EU \
+  --format json
+```
+Compare two arbitrary SQL statements:
+```bash
+semantic compare-results \
+  --left current.sql \
+  --right reference.sql \
+  --engine duckdb \
+  --setup setup.sql \
+  --format json
+```
+Use this for adoption checks: compare compiler output to your own trusted SQL, dashboards, or finance-approved reports before relying on a metric in production.
+See `docs/support-matrix.md` for current beta support levels and v1 release gates.
+## Execute compiled SQL
+The compiler does not own warehouse execution. Write SQL to a file and run it with your warehouse client, BI tool, notebook, dbt operation, or agent runtime:
+```bash
+semantic compile revenue \
+  --model semantic.yml \
+  --period "current year" \
+  --request request.json \
+  --dialect bigquery \
+  --output query.sql
+```
+## What is implemented
+- YAML semantic model loader
+- metric listing, metadata-aware search, description, validation, and join graph CLI
+- SQL / JSON / explain output
+- BigQuery, DuckDB, Postgres, and Snowflake dialect rendering
+- aggregate, ratio, custom, and cross-module metrics
+- safe multi-hop joins over `many-to-one` / `one-to-one` relationships
+- metric filters, query filters, breakdowns, and time grains
+- time comparisons
+- cumulative windows
+- target joins and target/comparison/cumulative combinations
+- custom SQL source-dialect transpilation and function allowlists
+- result comparison harness for DuckDB and BigQuery
+- column registry generation from `information_schema` and dbt `manifest.json`
+- dialect matrix tests that compile and parse representative BigQuery, DuckDB, Postgres, and Snowflake SQL
+- CI lint/test gate
+## Development
+```bash
+uv venv
+uv pip install -e '.[dev]'
+uv run ruff check .
+uv run pytest
+uv build
+```