PyPI - boring-semantic-layer - Versions diffs - 0.0.1__tar.gz - Mend

boring-semantic-layer 0.0.1__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (26) hide show

boring_semantic_layer-0.0.1/.codespell.ignore-words +0 -0
boring_semantic_layer-0.0.1/.envrc +5 -0
boring_semantic_layer-0.0.1/.envrc.user +1 -0
boring_semantic_layer-0.0.1/.gitignore +11 -0
boring_semantic_layer-0.0.1/.gitignore.template +5 -0
boring_semantic_layer-0.0.1/.pre-commit-config.yaml +33 -0
boring_semantic_layer-0.0.1/PKG-INFO +495 -0
boring_semantic_layer-0.0.1/README.md +481 -0
boring_semantic_layer-0.0.1/examples/basic_example.py +71 -0
boring_semantic_layer-0.0.1/examples/example_flight_query.py +44 -0
boring_semantic_layer-0.0.1/examples/example_flight_semantic_model.py +53 -0
boring_semantic_layer-0.0.1/examples/example_query_join_many.py +79 -0
boring_semantic_layer-0.0.1/examples/materialize_example.py +45 -0
boring_semantic_layer-0.0.1/examples/mcp_example.py +237 -0
boring_semantic_layer-0.0.1/flake.lock +99 -0
boring_semantic_layer-0.0.1/flake.nix +360 -0
boring_semantic_layer-0.0.1/pyproject.toml +34 -0
boring_semantic_layer-0.0.1/requirements-dev.txt +275 -0
boring_semantic_layer-0.0.1/src/boring_semantic_layer/__init__.py +2 -0
boring_semantic_layer-0.0.1/src/boring_semantic_layer/semantic_model.py +985 -0
boring_semantic_layer-0.0.1/tests/test_join_factories.py +121 -0
boring_semantic_layer-0.0.1/tests/test_malloy_style_joins.py +141 -0
boring_semantic_layer-0.0.1/tests/test_materialize.py +122 -0
boring_semantic_layer-0.0.1/tests/test_semantic_model.py +887 -0
boring_semantic_layer-0.0.1/tt.md +0 -0
boring_semantic_layer-0.0.1/uv.lock +1550 -0

boring_semantic_layer-0.0.1/.codespell.ignore-words ADDED Viewed

File without changes

boring_semantic_layer-0.0.1/.envrc ADDED Viewed

@@ -0,0 +1,5 @@
+if ! has nix_direnv_version || ! nix_direnv_version 2.4.0; then
+    source_url "https://raw.githubusercontent.com/nix-community/nix-direnv/2.4.0/direnvrc" "sha256-XQzUAvL6pysIJnRJyR7uVpmUSZfc7LSgWQwq/4mBr1U="
+fi
+source_env_if_exists .envrc.secrets
+source_env_if_exists .envrc.user

boring_semantic_layer-0.0.1/.envrc.user ADDED Viewed

	@@ -0,0 +1 @@
1	+ use flake .#editable

boring_semantic_layer-0.0.1/.gitignore ADDED Viewed

@@ -0,0 +1,11 @@
+.envrc.secrets
+builds/
+*pyc
+.gitignore
+.direnv/*
+**__pycache__**
+archives/**
+.cursor/*
+.DS_Store
+.vscode/*
+malloy-samples/

boring_semantic_layer-0.0.1/.gitignore.template ADDED Viewed

@@ -0,0 +1,5 @@
+.envrc.secrets
+builds/
+*pyc
+.gitignore
+.direnv/*

boring_semantic_layer-0.0.1/.pre-commit-config.yaml ADDED Viewed

@@ -0,0 +1,33 @@
+ci:
+  autofix_commit_msg: "style: auto fixes from pre-commit.ci hooks"
+  autofix_prs: false
+  autoupdate_commit_msg: "chore(deps): pre-commit.ci autoupdate"
+  skip:
+    - ruff
+default_stages:
+  - pre-commit
+repos:
+  - repo: https://github.com/codespell-project/codespell
+    rev: v2.4.1
+    hooks:
+      - id: codespell
+        additional_dependencies:
+          - tomli
+        args: [--ignore-words=.codespell.ignore-words]
+  - repo: https://github.com/astral-sh/ruff-pre-commit
+    # Ruff version.
+    rev: v0.11.13
+    hooks:
+      # Run the linter.
+      - id: ruff
+        args: ["--output-format=full", "--fix", "src", "examples", "tests"]
+      # Run the formatter.
+      - id: ruff-format
+  - repo: https://github.com/astral-sh/uv-pre-commit
+    # uv version.
+    rev: 0.7.13
+    hooks:
+      # Update the uv lockfile
+      - id: uv-lock
+      - id: uv-export
+        args: ["--frozen", "--no-hashes", "--no-emit-project", "--all-groups", "--all-extras", "--output-file=requirements-dev.txt"]

boring_semantic_layer-0.0.1/PKG-INFO ADDED Viewed

@@ -0,0 +1,495 @@
+Metadata-Version: 2.4
+Name: boring-semantic-layer
+Version: 0.0.1
+Summary: A boring semantic layer built with ibis
+Requires-Python: >=3.12
+Requires-Dist: ibis-framework>=10.6.0
+Requires-Dist: urllib3>=2.2.3
+Provides-Extra: examples
+Requires-Dist: duckdb>=1.3.1; extra == 'examples'
+Requires-Dist: xorq>=2.2.0; extra == 'examples'
+Provides-Extra: xorq
+Requires-Dist: xorq>=2.2.0; extra == 'xorq'
+Description-Content-Type: text/markdown
+# Boring Semantic Layer (BSL)
+The Boring Semantic Layer (BSL) is a lightweight semantic layer based on [Ibis](https://ibis-project.org/).
+**Key Features:**
+- **Lightweight**: `pip install boring-semantic-layer`
+- **Ibis-powered**: Built on top of [Ibis](https://ibis-project.org/), supporting any database engine that Ibis integrates with (DuckDB, Snowflake, BigQuery, PostgreSQL, and more)
+- **MCP-friendly**: Perfect for connecting Large Language Models to structured data sources
+*This project is a joint effort by [xorq-labs](https://github.com/xorq-labs) and [boringdata](https://www.boringdata.io/).*
+We welcome feedback and contributions!
+# Quick Example
+**1. Define your ibis input table**
+```python
+import ibis
+flights_tbl = ibis.table(
+    name="flights",
+    schema={"origin": "string", "carrier": "string"}
+)
+```
+**2. Define a semantic model**
+```python
+from boring_semantic_layer import SemanticModel
+flights_sm = SemanticModel(
+    table=flights_tbl,
+    dimensions={"origin": lambda t: t.origin},
+    measures={"flight_count": lambda t: t.count()}
+)
+```
+**3. Query it**
+```python
+flights_sm.query(
+    dimensions=["origin"],
+    measures=["flight_count"]
+).execute()
+```
+**Example output (dataframe):**
+| origin | flight_count |
+|--------|--------------|
+| JFK    | 3689         |
+| LGA    | 2941         |
+| ...    | ...          |
+-----
+## Table of Contents
+- [Installation](#installation)
+- [Get Started](#get-started)
+  1. [Get Sample Data](#1-get-sample-data)
+  2. [Build a Semantic Model](#2-build-a-semantic-model)
+  3. [Query a Semantic Model](#3-query-a-semantic-model)
+- [Features](#features)
+  - [Filters](#filters)
+    - [Ibis Expression](#ibis-expression)
+    - [JSON-based (MCP & LLM friendly)](#json-based-mcp-llm-friendly)
+  - [Time-Based Dimensions and Queries](#time-based-dimensions-and-queries)
+  - [Joins Across Semantic Models](#joins-across-semantic-models)
+    - [Classic SQL Joins](#classic-sql-joins)
+    - [join_one](#join_one)
+    - [join_many](#join_many)
+    - [join_cross](#join_cross)
+- [Reference](#reference)
+  - [SemanticModel](#semanticmodel)
+  - [Query (SemanticModel.query / QueryExpr)](#query-semanticmodelquery--queryexpr)
+-----
+## Installation
+```bash
+pip install boring-semantic-layer
+```
+-----
+## Get Started
+### 1. Get Sample Data
+We'll use a public flight dataset from the [Malloy Samples repository](https://github.com/malloydata/malloy-samples/tree/main/data).
+```bash
+git clone https://github.com/malloydata/malloy-samples
+```
+### 2. Build a Semantic Model
+Define your data source and create a semantic model that describes your data in terms of dimensions and measures.
+```python
+import ibis
+from boring_semantic_layer import SemanticModel
+# Connect to your database (here, DuckDB in-memory for demo)
+con = ibis.duckdb.connect(":memory:")
+flights_tbl = con.read_parquet("malloy-samples/data/flights.parquet")
+# Define the semantic model
+flights_sm = SemanticModel(
+    name="flights",
+    table=flights_tbl,
+    dimensions={
+        'origin': lambda t: t.origin,
+        'destination': lambda t: t.dest,
+        'year': lambda t: t.year
+    },
+    measures={
+        'total_flights': lambda t: t.count(),
+        'total_distance': lambda t: t.distance.sum(),
+        'avg_distance': lambda t: t.distance.mean(),
+    }
+)
+```
+- **Dimensions** are attributes to group or filter by (e.g., origin, destination).
+- **Measures** are aggregations or calculations (e.g., total flights, average distance).
+All dimensions and measures are defined as Ibis expressions.
+Ibis expressions are Python functions that represent database operations.
+They allow you to write database queries using familiar Python syntax while Ibis handles the translation to optimized SQL for your specific database backend (like DuckDB, PostgreSQL, BigQuery, etc.).
+For example, in our semantic model:
+- `lambda t: t.origin` is an Ibis expression that references the "origin" column
+- `lambda t: t.count()` is an Ibis expression that counts rows
+- `lambda t: t.distance.mean()` is an Ibis expression that calculates the average distance
+The `t` parameter represents the table, and you can chain operations like `t.origin.upper()` or `t.dep_delay > 0` to create complex expressions. Ibis ensures these expressions are translated to efficient SQL queries.
+---
+### 3. Query a Semantic Model
+Use your semantic model to run queries—selecting dimensions, measures, and applying filters or limits.
+```python
+flights_sm.query(
+    dimensions=['origin'],
+    measures=['total_flights', 'avg_distance'],
+    limit=10
+).execute()
+```
+Example output:
+| origin | total_flights | avg_distance |
+|--------|---------------|--------------|
+| JFK    | 3689          | 1047.71      |
+| PHL    | 7708          | 1044.97      |
+| ...    | ...           | ...          |
+-----
+## Features
+### Filters
+#### Ibis Expression
+The `query` method can filter data using raw Ibis expressions for full flexibility.
+```python
+flights_sm.query(
+    dimensions=['origin'],
+    measures=['total_flights'],
+    filters=[
+        lambda t: t.origin == 'JFK'
+    ]
+)
+```
+| origin | total_flights |
+|--------|---------------|
+| JFK    | 3689          |
+#### JSON-based (MCP & LLM friendly)
+A format that's easy to serialize, good for dynamic queries or LLM integration.
+```python
+flights_sm.query(
+    dimensions=['origin'],
+    measures=['total_flights'],
+    filters=[
+        {
+            'operator': 'AND',
+            'conditions': [
+                {'field': 'origin', 'operator': 'in', 'values': ['JFK', 'LGA', 'PHL']},
+                {'field': 'total_flights', 'operator': '>', 'value': 5000}
+            ]
+        }
+    ]
+).execute()
+```
+BSL supports the following operators: `=`, `!=`, `>`, `>=`, `in`, `not in`, `like`, `not like`, `is null`, `is not null`, `AND`, `OR`
+### Time-Based Dimensions and Queries
+BSL has built-in support for flexible time-based analysis.
+To use it, define a `time_dimension` in your `SemanticModel` that points to a timestamp column.
+You can also set `smallest_time_grain` to prevent incorrect time aggregations.
+```python
+flights_sm_with_time = SemanticModel(
+    name="flights_timed",
+    table=flights_tbl,
+    dimensions={
+        'origin': lambda t: t.origin,
+        'destination': lambda t: t.dest,
+        'year': lambda t: t.year,
+    },
+    measures={
+        'total_flights': lambda t: t.count(),
+    },
+    time_dimension='dep_time', # The column containing timestamps. Crucial for time-based queries.
+    smallest_time_grain='TIME_GRAIN_SECOND' # Optional: sets the lowest granularity (e.g., DAY, MONTH).
+)
+# With the time dimension defined, you can query using a specific time range and grain.
+query_time_based_df = flights_sm_with_time.query(
+    dims=['origin'],
+    measures=['total_flights'],
+    time_range={'start': '2013-01-01', 'end': '2013-01-31'},
+    time_grain='TIME_GRAIN_DAY' # Use specific TIME_GRAIN constants
+).execute()
+print(query_time_based_df)
+```
+Example output:
+| origin | arr_time   | flight_count |
+|--------|------------|--------------|
+| PHL    | 2013-01-01 | 5            |
+| CLE    | 2013-01-01 | 5            |
+| DFW    | 2013-01-01 | 7            |
+| DFW    | 2013-01-02 | 9            |
+| DFW    | 2013-01-03 | 13           |
+### Joins Across Semantic Models
+BSL allows you to join multiple `SemanticModel` instances to enrich your data. Joins are defined in the `joins` parameter of a `SemanticModel`.
+There are four main ways to define joins:
+#### Classic SQL Joins
+For full control, you can create a `Join` object directly, specifying the join condition with an `on` lambda function and the join type with `how` (e.g., `'inner'`, `'left'`).
+First, let's define two semantic models: one for flights and one for carriers.
+The flight model resulting from a join with the carriers model:
+```python
+from boring_semantic_layer import  Join, SemanticModel
+import ibis
+import os
+# Assume `con` is an existing Ibis connection from the Quickstart example.
+con = ibis.duckdb.connect(":memory:")
+# Load the required tables from the sample data
+flights_tbl = con.read_parquet("malloy-samples/data/flights.parquet")
+carriers_tbl = con.read_parquet("malloy-samples/data/carriers.parquet")
+# First, define the 'carriers' semantic model to join with.
+carriers_sm = SemanticModel(
+    name="carriers",
+    table=carriers_tbl,
+    dimensions={
+        "code": lambda t: t.code,
+        "name": lambda t: t.name,
+        "nickname": lambda t: t.nickname,
+    },
+    measures={
+        "carrier_count": lambda t: t.count(),
+    }
+)
+# Now, define the 'flights' semantic model with a join to 'carriers'
+flight_sm = SemanticModel(
+    name="flights",
+    table=flights_tbl,
+    dimensions={
+        "origin": lambda t: t.origin,
+        "destination": lambda t: t.destination,
+        "carrier": lambda t: t.carrier, # This is the join key
+    },
+    measures={
+        "flight_count": lambda t: t.count(),
+    },
+    joins={
+        "carriers": Join(
+            model=carriers_sm,
+            on=lambda left, right: left.carrier == right.code,
+        ),
+    }
+)
+# Querying across the joined models to get flight counts by carrier name
+query_joined_df = flight_sm.query(
+    dims=['carriers.name', 'origin'],
+    measures=['flight_count'],
+    limit=10
+).execute()
+```
+| carriers_name              | origin | flight_count |
+|---------------------------|--------|--------------|
+| Delta Air Lines           | MDT    | 235          |
+| Delta Air Lines           | ATL    | 8419         |
+| Comair (Delta Connections)| ATL    | 239          |
+| American Airlines         | DFW    | 8742         |
+| American Eagle Airlines   | JFK    | 418          |
+#### join_one
+For common join patterns, BSL provides helper class methods inspired by [Malloy](https://docs.malloydata.dev/documentation/language/join): `Join.one`, `Join.many`, and `Join.cross`.
+These simplify joins based on primary/foreign key relationships.
+To use them, first define a `primary_key` on the model you are joining to. The primary key should be one of the model's dimensions.
+```python
+carriers_pk_sm = SemanticModel(
+    name="carriers",
+    table=con.read_parquet("malloy-samples/data/carriers.parquet"),
+    primary_key="code",
+    dimensions={
+        'code': lambda t: t.code,
+        'name': lambda t: t.name
+    },
+    measures={'carrier_count': lambda t: t.count()}
+)
+```
+Now, you can use `Join.one` in the `flights` model to link to `carriers_pk_sm`. The `with_` parameter specifies the foreign key on the `flights` model.
+```python
+from boring_semantic_layer import Join
+flights_with_join_one_sm = SemanticModel(
+    name="flights",
+    table=flights_tbl,
+    dimensions={'origin': lambda t: t.origin},
+    measures={'flight_count': lambda t: t.count()},
+    joins={
+        "carriers": Join.one(
+            alias="carriers",
+            model=carriers_pk_sm,
+            with_=lambda t: t.carrier
+        )
+    }
+)
+```
+- **`Join.one(alias, model, with_)`**: Use for one-to-one or many-to-one relationships. It joins where the foreign key specified in `with_` matches the `primary_key` of the joined `model`.
+#### join_many
+- **`Join.many(alias, model, with_)`**: Similar to `Join.one`, but semantically represents a one-to-many relationship.
+#### join_cross
+- **`Join.cross(alias, model)`**: Creates a cross product, joining every row from the left model with every row of the right `model`.
+Querying remains the same—just reference the joined fields using the alias.
+```python
+flights_with_join_one_sm.query(
+    dimensions=["carriers.name"],
+    measures=["flight_count"],
+    limit=5
+).execute()
+```
+Example output:
+| carriers_name            | flight_count |
+|-------------------------|--------------|
+| Delta Air Lines         | 10000        |
+| American Airlines       | 9000         |
+| United Airlines         | 8500         |
+| Southwest Airlines      | 8000         |
+| JetBlue Airways         | 7500         |
+## Reference
+### SemanticModel
+| Field                | Type                                      | Required | Allowed Values / Notes                                                                                      |
+|----------------------|-------------------------------------------|----------|------------------------------------------------------------------------------------------------------------|
+| `table`              | Ibis table expression                     | Yes      | Any Ibis table or view                                                                                     |
+| `dimensions`         | dict[str, callable]                       | Yes      | Keys: dimension names; Values: functions mapping table → column                                             |
+| `measures`           | dict[str, callable]                       | Yes      | Keys: measure names; Values: functions mapping table → aggregation                                          |
+| `joins`              | dict[str, Join]                           | No       | Keys: join alias; Values: `Join` object (see below)                                                         |
+| `primary_key`        | str                                       | No       | Name of the primary key dimension (required for certain join types)                                         |
+| `name`               | str                                       | No       | Optional model name (inferred from table if omitted)                                                        |
+| `time_dimension`     | str                                       | No       | Name of the column to use as the time dimension                                                             |
+| `smallest_time_grain`| str                                       | No       | One of:<br>`TIME_GRAIN_SECOND`, `TIME_GRAIN_MINUTE`, `TIME_GRAIN_HOUR`, `TIME_GRAIN_DAY`,<br>`TIME_GRAIN_WEEK`, `TIME_GRAIN_MONTH`, `TIME_GRAIN_QUARTER`, `TIME_GRAIN_YEAR` |
+#### Join object (for `joins`)
+- Use `Join.one(alias, model, with_)` for one-to-one/many-to-one
+- Use `Join.many(alias, model, with_)` for one-to-many
+- Use `Join.cross(alias, model)` for cross join
+---
+### Query (SemanticModel.query / QueryExpr)
+| Parameter      | Type                                              | Required | Allowed Values / Notes                                                                                      |
+|----------------|---------------------------------------------------|----------|------------------------------------------------------------------------------------------------------------|
+| `dimensions`   | list[str]                                         | No       | List of dimension names (can include joined fields, e.g. `"carriers.name"`)                                 |
+| `measures`     | list[str]                                         | No       | List of measure names (can include joined fields)                                                           |
+| `filters`      | list[dict/str/callable] or dict/str/callable      | No       | See below for filter formats and operators                                                                  |
+| `order_by`     | list[tuple[str, str]]                             | No       | List of (field, direction) tuples, e.g. `[("avg_delay", "desc")]`                                           |
+| `limit`        | int                                               | No       | Maximum number of rows to return                                                                            |
+| `time_range`   | dict with `start` and `end` (ISO 8601 strings)    | No       | Example: `{'start': '2024-01-01', 'end': '2024-12-31'}`                                                     |
+| `time_grain`   | str                                               | No       | One of:<br>`TIME_GRAIN_SECOND`, `TIME_GRAIN_MINUTE`, `TIME_GRAIN_HOUR`, `TIME_GRAIN_DAY`,<br>`TIME_GRAIN_WEEK`, `TIME_GRAIN_MONTH`, `TIME_GRAIN_QUARTER`, `TIME_GRAIN_YEAR` |
+#### Filters
+- **Simple filter (dict):**
+  ```python
+  {"field": "origin", "operator": "=", "value": "JFK"}
+  ```
+- **Compound filter (dict):**
+  ```python
+  {
+    "operator": "AND",
+    "conditions": [
+      {"field": "origin", "operator": "in", "values": ["JFK", "LGA"]},
+      {"field": "year", "operator": ">", "value": 2010}
+    ]
+  }
+  ```
+- **Callable:** `lambda t: t.origin == 'JFK'`
+- **String:** `"_.origin == 'JFK'"`
+**Supported operators:** `=`, `!=`, `>`, `>=`, `<`, `<=`, `in`, `not in`, `like`, `not like`, `is null`, `is not null`, `AND`, `OR`
+#### Example
+```python
+flights_sm.query(
+    dimensions=['origin', 'year'],
+    measures=['total_flights'],
+    filters=[
+        {"field": "origin", "operator": "in", "values": ["JFK", "LGA"]},
+        {"field": "year", "operator": ">", "value": 2010}
+    ],
+    order_by=[('total_flights', 'desc')],
+    limit=10,
+    time_range={'start': '2015-01-01', 'end': '2015-12-31'},
+    time_grain='TIME_GRAIN_MONTH'
+)
+```
+Example output:
+| origin | year | total_flights |
+|--------|------|---------------|
+| JFK    | 2015 | 350           |
+| LGA    | 2015 | 300           |