PyPI - agentic-data-contracts - Versions diffs - 0.2.4__tar.gz → 0.2.6__tar.gz - Mend

agentic-data-contracts 0.2.4tar.gz → 0.2.6tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (81) hide show

{agentic_data_contracts-0.2.4 → agentic_data_contracts-0.2.6}/CHANGELOG.md RENAMED Viewed

@@ -2,6 +2,23 @@
 All notable changes to this project will be documented in this file.
+## [0.2.6] - 2026-03-29
+### Changed
+- **Compact system prompt at scale**: When metrics exceed 20, the system prompt shows domain names with counts (e.g., "acquisition (45)") instead of listing every metric. Reduces prompt from ~6K to ~100 tokens for large metric sets.
+- **Paginated `list_tables`**: Added `limit` (default 50) and `offset` parameters for handling schemas with many tables. Response includes `total` count and `next_offset` for pagination.
+- **Cached wildcard resolution**: `resolve_tables()` is now idempotent — subsequent calls are no-ops, avoiding redundant database queries.
+## [0.2.5] - 2026-03-29
+### Added
+- **Table relationship metadata**: `Relationship` dataclass and `get_relationships()` on `SemanticSource` protocol for declaring join paths between tables (from/to column + relationship type)
+- **Relationships in system prompt**: `to_system_prompt()` includes join paths so the agent knows how to combine tables correctly
+- **YamlSource relationships**: Parsed from `relationships` section in semantic YAML files
+- DbtSource and CubeSource return empty relationships (ready for future parsing of native join metadata)
 ## [0.2.4] - 2026-03-29
 ### Added

{agentic_data_contracts-0.2.4 → agentic_data_contracts-0.2.6}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: agentic-data-contracts
-Version: 0.2.4
+Version: 0.2.6
 Summary: YAML-first data contract governance for AI agents
 Project-URL: Homepage, https://github.com/flyersworder/agentic-data-contracts
 Project-URL: Repository, https://github.com/flyersworder/agentic-data-contracts
@@ -277,6 +277,23 @@ semantic:
     path: "./cube/schema.yml"
 ```
+## Table Relationships
+Define join paths so the agent knows how to combine tables correctly:
+```yaml
+# semantic.yml
+relationships:
+  - from: analytics.orders.customer_id
+    to: analytics.customers.id
+    type: many_to_one
+  - from: analytics.orders.product_id
+    to: analytics.products.id
+    type: many_to_one
+```
+The agent sees these in its system prompt and uses them to write correct JOINs instead of guessing from column names.
 ## Scalable Metric Discovery
 For large data lakes with hundreds of KPIs, group metrics by domain and let the agent discover them efficiently:
@@ -297,6 +314,18 @@ lookup_metric("acquisition cost")   → fuzzy match, returns [CAC, CPA] as candi
 list_metrics(domain="retention")    → only retention metrics
 ```
+## Scaling to Large Organizations
+Tested for 200+ tables, 300+ metrics, 50+ relationships across multiple schemas.
+| Concern | How it scales |
+|---|---|
+| **System prompt size** | >20 metrics: auto-switches to compact domain counts (`acquisition (45)`) instead of listing every metric |
+| **Table discovery** | `list_tables` is paginated (default 50, with offset). Use `schema` filter for targeted browsing |
+| **Wildcard schemas** | `tables: ["*"]` discovers tables from the database. Resolution is cached — no repeated queries |
+| **Metric lookup** | Fuzzy search via `thefuzz` (C++ backed) — sub-millisecond even with 1000+ metrics |
+| **SQL validation** | Set-based allowlist check — O(1) per table reference regardless of allowlist size |
 ## Resource Limits
 ```yaml

{agentic_data_contracts-0.2.4 → agentic_data_contracts-0.2.6}/README.md RENAMED Viewed

@@ -224,6 +224,23 @@ semantic:
     path: "./cube/schema.yml"
 ```
+## Table Relationships
+Define join paths so the agent knows how to combine tables correctly:
+```yaml
+# semantic.yml
+relationships:
+  - from: analytics.orders.customer_id
+    to: analytics.customers.id
+    type: many_to_one
+  - from: analytics.orders.product_id
+    to: analytics.products.id
+    type: many_to_one
+```
+The agent sees these in its system prompt and uses them to write correct JOINs instead of guessing from column names.
 ## Scalable Metric Discovery
 For large data lakes with hundreds of KPIs, group metrics by domain and let the agent discover them efficiently:
@@ -244,6 +261,18 @@ lookup_metric("acquisition cost")   → fuzzy match, returns [CAC, CPA] as candi
 list_metrics(domain="retention")    → only retention metrics
 ```
+## Scaling to Large Organizations
+Tested for 200+ tables, 300+ metrics, 50+ relationships across multiple schemas.
+| Concern | How it scales |
+|---|---|
+| **System prompt size** | >20 metrics: auto-switches to compact domain counts (`acquisition (45)`) instead of listing every metric |
+| **Table discovery** | `list_tables` is paginated (default 50, with offset). Use `schema` filter for targeted browsing |
+| **Wildcard schemas** | `tables: ["*"]` discovers tables from the database. Resolution is cached — no repeated queries |
+| **Metric lookup** | Fuzzy search via `thefuzz` (C++ backed) — sub-millisecond even with 1000+ metrics |
+| **SQL validation** | Set-based allowlist check — O(1) per table reference regardless of allowlist size |
 ## Resource Limits
 ```yaml

{agentic_data_contracts-0.2.4 → agentic_data_contracts-0.2.6}/examples/revenue_agent/contract.yml RENAMED Viewed

@@ -9,12 +9,15 @@ semantic:
     - schema: analytics
       tables: [orders, customers, subscriptions]
   forbidden_operations: [DELETE, DROP, TRUNCATE, UPDATE, INSERT]
+  domains:
+    revenue: [total_revenue, revenue_by_region]
   rules:
     - name: tenant_isolation
       description: "All queries must filter by tenant_id"
       enforcement: block
+      filter_column: tenant_id
     - name: use_semantic_revenue
-      description: "Revenue calculations must use the dbt metric definition"
+      description: "Revenue calculations must use the metric definitions"
       enforcement: warn
     - name: no_select_star
       description: "Must specify explicit columns"

agentic_data_contracts-0.2.6/examples/revenue_agent/semantic.yml ADDED Viewed

@@ -0,0 +1,22 @@
+# Semantic source — define only what the database can't tell the agent.
+# Table columns are discovered at runtime via the describe_table tool.
+metrics:
+  - name: total_revenue
+    description: "Total revenue from completed orders"
+    sql_expression: "SUM(amount) FILTER (WHERE status = 'completed')"
+    source_model: analytics.orders
+    filters:
+      - "status = 'completed'"
+  - name: revenue_by_region
+    description: "Revenue broken down by customer region"
+    sql_expression: "SUM(o.amount) GROUP BY c.region"
+    source_model: analytics.orders
+    filters:
+      - "o.status = 'completed'"
+relationships:
+  - from: analytics.orders.customer_id
+    to: analytics.customers.id
+    type: many_to_one

{agentic_data_contracts-0.2.4 → agentic_data_contracts-0.2.6}/pyproject.toml RENAMED Viewed

@@ -1,6 +1,6 @@
 [project]
 name = "agentic-data-contracts"
-version = "0.2.4"
+version = "0.2.6"
 description = "YAML-first data contract governance for AI agents"
 readme = "README.md"
 requires-python = ">=3.12"

{agentic_data_contracts-0.2.4 → agentic_data_contracts-0.2.6}/src/agentic_data_contracts/core/contract.py RENAMED Viewed

@@ -23,6 +23,7 @@ class DataContract:
     def __init__(self, schema: DataContractSchema) -> None:
         self.schema = schema
+        self._tables_resolved: bool = False
     @property
     def name(self) -> str:
@@ -43,16 +44,18 @@ class DataContract:
         """Check if any schema uses wildcard ('*') for tables."""
         return any("*" in entry.tables for entry in self.schema.semantic.allowed_tables)
-    def resolve_tables(self, adapter: DatabaseAdapter) -> None:
+    def resolve_tables(self, adapter: DatabaseAdapter, *, force: bool = False) -> None:
         """Expand wildcard tables using the database adapter.
         Replaces ["*"] entries with actual table names from the database.
-        Call this once after creating the adapter. Results are cached
-        on the schema object.
+        Results are cached — subsequent calls are no-ops unless force=True.
         """
+        if self._tables_resolved and not force:
+            return
         for entry in self.schema.semantic.allowed_tables:
             if "*" in entry.tables:
                 entry.tables = adapter.list_tables(entry.schema_)
+        self._tables_resolved = True
     def allowed_table_names(self) -> list[str]:
         names: list[str] = []
@@ -171,6 +174,17 @@ class DataContract:
             )
             sections.append(line)
+        # Table relationships
+        if semantic_source is not None:
+            rels = semantic_source.get_relationships()
+            if rels:
+                sections.append(
+                    "\n### Table Relationships\n"
+                    "Use these join paths when combining tables:"
+                )
+                for r in rels:
+                    sections.append(f"- {r.from_} \u2192 {r.to} ({r.type})")
         # Resource limits
         res = self.schema.resources
         if res:
@@ -193,6 +207,10 @@ class DataContract:
         return "\n".join(sections)
+    # Max metrics to list individually in system prompt before switching
+    # to compact domain-only summaries.
+    METRIC_DETAIL_THRESHOLD = 20
     def _build_metrics_section(
         self, semantic_source: SemanticSource | None
     ) -> str | None:
@@ -205,11 +223,27 @@ class DataContract:
         domains = self.schema.semantic.domains
         lines: list[str] = []
-        lines.append(
-            "\n### Available Metrics (use lookup_metric for full SQL definitions)"
-        )
+        compact = len(metrics) > self.METRIC_DETAIL_THRESHOLD
-        if domains:
+        if compact and domains:
+            # Large metric set with domains — show counts only
+            lines.append("\n### Available Metrics")
+            metric_names = {m.name for m in metrics}
+            domain_parts = []
+            for domain, names in domains.items():
+                count = sum(1 for n in names if n in metric_names)
+                if count:
+                    domain_parts.append(f"{domain} ({count})")
+            lines.append(f"Domains: {', '.join(domain_parts)}")
+            lines.append(
+                '\nUse list_metrics(domain="...") to browse,'
+                ' lookup_metric("...") to get SQL definitions.'
+            )
+        elif domains:
+            # Small metric set with domains — list with descriptions
+            lines.append(
+                "\n### Available Metrics (use lookup_metric for full SQL definitions)"
+            )
             metric_map = {m.name: m for m in metrics}
             for domain, names in domains.items():
                 entries = []
@@ -219,12 +253,27 @@ class DataContract:
                         entries.append(f"{m.name} \u2014 {m.description}")
                 if entries:
                     lines.append(f"**{domain}:** {', '.join(entries)}")
+            lines.append(
+                "\nUse the lookup_metric tool to get the SQL definition"
+                " before computing any KPI."
+            )
+        elif compact:
+            # Large metric set without domains — just show count
+            lines.append("\n### Available Metrics")
+            lines.append(f"{len(metrics)} metrics available.")
+            lines.append(
+                "\nUse list_metrics() to browse,"
+                ' lookup_metric("...") to get SQL definitions.'
+            )
         else:
+            # Small metric set without domains — list all
+            lines.append(
+                "\n### Available Metrics (use lookup_metric for full SQL definitions)"
+            )
             for m in metrics:
                 lines.append(f"- {m.name} \u2014 {m.description}")
-        lines.append(
-            "\nUse the lookup_metric tool to get the SQL definition"
-            " before computing any KPI."
-        )
+            lines.append(
+                "\nUse the lookup_metric tool to get the SQL definition"
+                " before computing any KPI."
+            )
         return "\n".join(lines)

{agentic_data_contracts-0.2.4 → agentic_data_contracts-0.2.6}/src/agentic_data_contracts/semantic/base.py RENAMED Viewed

@@ -20,12 +20,20 @@ class MetricDefinition:
     filters: list[str] = field(default_factory=list)
+@dataclass
+class Relationship:
+    from_: str  # "schema.table.column"
+    to: str  # "schema.table.column"
+    type: str = "many_to_one"  # many_to_one | one_to_one | many_to_many
 @runtime_checkable
 class SemanticSource(Protocol):
     def get_metrics(self) -> list[MetricDefinition]: ...
     def get_metric(self, name: str) -> MetricDefinition | None: ...
     def get_table_schema(self, schema: str, table: str) -> TableSchema | None: ...
     def search_metrics(self, query: str) -> list[MetricDefinition]: ...
+    def get_relationships(self) -> list[Relationship]: ...
 def fuzzy_search_metrics(

{agentic_data_contracts-0.2.4 → agentic_data_contracts-0.2.6}/src/agentic_data_contracts/semantic/cube.py RENAMED Viewed

@@ -7,7 +7,11 @@ from pathlib import Path
 import yaml
 from agentic_data_contracts.adapters.base import Column, TableSchema
-from agentic_data_contracts.semantic.base import MetricDefinition, fuzzy_search_metrics
+from agentic_data_contracts.semantic.base import (
+    MetricDefinition,
+    Relationship,
+    fuzzy_search_metrics,
+)
 class CubeSource:
@@ -54,5 +58,8 @@ class CubeSource:
     def search_metrics(self, query: str) -> list[MetricDefinition]:
         return fuzzy_search_metrics(self._metrics, self.get_metric, query)
+    def get_relationships(self) -> list[Relationship]:
+        return []  # TODO: parse from Cube joins config
     def get_table_schema(self, schema: str, table: str) -> TableSchema | None:
         return self._tables.get(f"{schema}.{table}")

{agentic_data_contracts-0.2.4 → agentic_data_contracts-0.2.6}/src/agentic_data_contracts/semantic/dbt.py RENAMED Viewed

@@ -7,7 +7,11 @@ from pathlib import Path
 from typing import Any
 from agentic_data_contracts.adapters.base import Column, TableSchema
-from agentic_data_contracts.semantic.base import MetricDefinition, fuzzy_search_metrics
+from agentic_data_contracts.semantic.base import (
+    MetricDefinition,
+    Relationship,
+    fuzzy_search_metrics,
+)
 class DbtSource:
@@ -77,5 +81,8 @@ class DbtSource:
     def search_metrics(self, query: str) -> list[MetricDefinition]:
         return fuzzy_search_metrics(self._metrics, self.get_metric, query)
+    def get_relationships(self) -> list[Relationship]:
+        return []  # TODO: parse from dbt manifest relationships/refs
     def get_table_schema(self, schema: str, table: str) -> TableSchema | None:
         return self._tables.get(f"{schema}.{table}")

{agentic_data_contracts-0.2.4 → agentic_data_contracts-0.2.6}/src/agentic_data_contracts/semantic/yaml_source.py RENAMED Viewed

@@ -7,7 +7,11 @@ from pathlib import Path
 import yaml
 from agentic_data_contracts.adapters.base import Column, TableSchema
-from agentic_data_contracts.semantic.base import MetricDefinition, fuzzy_search_metrics
+from agentic_data_contracts.semantic.base import (
+    MetricDefinition,
+    Relationship,
+    fuzzy_search_metrics,
+)
 class YamlSource:
@@ -38,6 +42,14 @@ class YamlSource:
                     for c in t.get("columns", [])
                 ]
             )
+        self._relationships = [
+            Relationship(
+                from_=r["from"],
+                to=r["to"],
+                type=r.get("type", "many_to_one"),
+            )
+            for r in raw.get("relationships", [])
+        ]
     def get_metrics(self) -> list[MetricDefinition]:
         return list(self._metrics)
@@ -51,5 +63,8 @@ class YamlSource:
     def search_metrics(self, query: str) -> list[MetricDefinition]:
         return fuzzy_search_metrics(self._metrics, self.get_metric, query)
+    def get_relationships(self) -> list[Relationship]:
+        return list(self._relationships)
     def get_table_schema(self, schema: str, table: str) -> TableSchema | None:
         return self._tables.get(f"{schema}.{table}")

{agentic_data_contracts-0.2.4 → agentic_data_contracts-0.2.6}/src/agentic_data_contracts/tools/factory.py RENAMED Viewed

@@ -60,14 +60,23 @@ def create_tools(
     # ── Tool 2: list_tables ───────────────────────────────────────────────────
     async def list_tables(args: dict[str, Any]) -> dict[str, Any]:
         schema_filter = args.get("schema")
-        tables: list[dict[str, Any]] = []
+        try:
+            limit = max(1, int(args.get("limit", 50)))
+        except (ValueError, TypeError):
+            limit = 50
+        try:
+            offset = max(0, int(args.get("offset", 0)))
+        except (ValueError, TypeError):
+            offset = 0
+        all_tables: list[dict[str, Any]] = []
         for entry in contract.schema.semantic.allowed_tables:
             if schema_filter and entry.schema_ != schema_filter:
                 continue
             if "*" in entry.tables:
                 return _text_response(
                     f"Schema '{entry.schema_}' uses wildcard tables"
-                    " but no database adapter is available to resolve them."
+                    " but no database adapter is available"
+                    " to resolve them."
                 )
             for table in entry.tables:
                 info: dict[str, Any] = {
@@ -78,8 +87,13 @@ def create_tools(
                     ts = semantic_source.get_table_schema(entry.schema_, table)
                     if ts is not None:
                         info["columns"] = [c.name for c in ts.columns]
-                tables.append(info)
-        return _text_response(json.dumps({"tables": tables}))
+                all_tables.append(info)
+        total = len(all_tables)
+        page = all_tables[offset : offset + limit]
+        result: dict[str, Any] = {"tables": page, "total": total}
+        if offset + limit < total:
+            result["next_offset"] = offset + limit
+        return _text_response(json.dumps(result))
     # ── Tool 3: describe_table ────────────────────────────────────────────────
     async def describe_table(args: dict[str, Any]) -> dict[str, Any]:
@@ -321,7 +335,8 @@ def create_tools(
             name="list_tables",
             description=(
                 "List allowed tables, optionally filtered by schema. "
-                "Includes column names when semantic source is available."
+                "Includes column names when semantic source is available. "
+                "Paginated \u2014 use limit/offset for large schemas."
             ),
             input_schema={
                 "type": "object",
@@ -329,7 +344,15 @@ def create_tools(
                     "schema": {
                         "type": "string",
                         "description": "Optional schema name to filter by",
-                    }
+                    },
+                    "limit": {
+                        "type": "integer",
+                        "description": "Max tables to return (default 50)",
+                    },
+                    "offset": {
+                        "type": "integer",
+                        "description": "Skip first N tables (default 0)",
+                    },
                 },
                 "required": [],
             },

{agentic_data_contracts-0.2.4 → agentic_data_contracts-0.2.6}/tests/fixtures/semantic_source.yml RENAMED Viewed

@@ -29,3 +29,24 @@ tables:
       - name: status
         type: VARCHAR
         description: "Order status: pending, completed, cancelled"
+      - name: customer_id
+        type: INTEGER
+        description: "FK to customers"
+  - schema: analytics
+    table: customers
+    columns:
+      - name: id
+        type: INTEGER
+        description: "Primary key"
+      - name: name
+        type: VARCHAR
+        description: "Customer name"
+      - name: region
+        type: VARCHAR
+        description: "Geographic region"
+relationships:
+  - from: analytics.orders.customer_id
+    to: analytics.customers.id
+    type: many_to_one

agentic_data_contracts-0.2.6/tests/test_core/test_scalability.py ADDED Viewed

@@ -0,0 +1,144 @@
+"""Tests for scalability improvements: compact prompt, pagination, caching."""
+from unittest.mock import MagicMock
+from agentic_data_contracts.adapters.base import DatabaseAdapter
+from agentic_data_contracts.core.contract import DataContract
+from agentic_data_contracts.core.schema import (
+    AllowedTable,
+    DataContractSchema,
+    SemanticConfig,
+)
+from agentic_data_contracts.semantic.base import MetricDefinition, Relationship
+class FakeSemanticSource:
+    """Fake source with configurable metric count."""
+    def __init__(self, count: int) -> None:
+        self._metrics = [
+            MetricDefinition(
+                name=f"metric_{i}",
+                description=f"Description for metric {i}",
+                sql_expression=f"SUM(col_{i})",
+            )
+            for i in range(count)
+        ]
+    def get_metrics(self) -> list[MetricDefinition]:
+        return list(self._metrics)
+    def get_metric(self, name: str) -> MetricDefinition | None:
+        for m in self._metrics:
+            if m.name == name:
+                return m
+        return None
+    def get_table_schema(self, schema: str, table: str):  # noqa: ANN201
+        return None
+    def search_metrics(self, query: str) -> list[MetricDefinition]:
+        return []
+    def get_relationships(self) -> list[Relationship]:
+        return []
+def _make_contract_with_domains(
+    metric_names: list[str],
+) -> DataContract:
+    domains = {
+        "domain_a": metric_names[: len(metric_names) // 2],
+        "domain_b": metric_names[len(metric_names) // 2 :],
+    }
+    schema = DataContractSchema(
+        name="test",
+        semantic=SemanticConfig(
+            allowed_tables=[
+                AllowedTable.model_validate({"schema": "public", "tables": ["t"]}),
+            ],
+            domains=domains,
+        ),
+    )
+    return DataContract(schema)
+class TestCompactMetricPrompt:
+    def test_small_set_lists_all_metrics(self) -> None:
+        source = FakeSemanticSource(5)
+        dc = _make_contract_with_domains([f"metric_{i}" for i in range(5)])
+        prompt = dc.to_system_prompt(semantic_source=source)
+        # Should list individual metric descriptions
+        assert "metric_0 \u2014" in prompt
+        assert "metric_4 \u2014" in prompt
+    def test_large_set_shows_domain_counts(self) -> None:
+        source = FakeSemanticSource(30)
+        dc = _make_contract_with_domains([f"metric_{i}" for i in range(30)])
+        prompt = dc.to_system_prompt(semantic_source=source)
+        # Should NOT list individual metrics
+        assert "metric_0 \u2014" not in prompt
+        # Should show domain counts
+        assert "domain_a (15)" in prompt
+        assert "domain_b (15)" in prompt
+        assert "list_metrics" in prompt
+    def test_large_set_no_domains_shows_count(self) -> None:
+        source = FakeSemanticSource(30)
+        schema = DataContractSchema(
+            name="test",
+            semantic=SemanticConfig(
+                allowed_tables=[
+                    AllowedTable.model_validate({"schema": "public", "tables": ["t"]}),
+                ],
+            ),
+        )
+        dc = DataContract(schema)
+        prompt = dc.to_system_prompt(semantic_source=source)
+        assert "30 metrics available" in prompt
+        assert "metric_0 \u2014" not in prompt
+    def test_threshold_boundary(self) -> None:
+        # Exactly at threshold — should still list individually
+        source = FakeSemanticSource(20)
+        schema = DataContractSchema(
+            name="test",
+            semantic=SemanticConfig(
+                allowed_tables=[
+                    AllowedTable.model_validate({"schema": "public", "tables": ["t"]}),
+                ],
+            ),
+        )
+        dc = DataContract(schema)
+        prompt = dc.to_system_prompt(semantic_source=source)
+        assert "metric_0 \u2014" in prompt
+        # One above threshold — compact mode
+        source = FakeSemanticSource(21)
+        prompt = dc.to_system_prompt(semantic_source=source)
+        assert "metric_0 \u2014" not in prompt
+        assert "21 metrics available" in prompt
+class TestWildcardCaching:
+    def test_resolve_tables_caches(self) -> None:
+        dc = DataContract(
+            DataContractSchema(
+                name="test",
+                semantic=SemanticConfig(
+                    allowed_tables=[
+                        AllowedTable.model_validate({"schema": "s", "tables": ["*"]}),
+                    ],
+                ),
+            )
+        )
+        mock_adapter = MagicMock(spec=DatabaseAdapter)
+        mock_adapter.list_tables.return_value = ["t1", "t2"]
+        dc.resolve_tables(mock_adapter)
+        assert "s.t1" in dc.allowed_table_names()
+        assert mock_adapter.list_tables.call_count == 1
+        # Second call should be a no-op
+        dc.resolve_tables(mock_adapter)
+        assert mock_adapter.list_tables.call_count == 1

agentic_data_contracts-0.2.6/tests/test_semantic/test_relationships.py ADDED Viewed

@@ -0,0 +1,83 @@
+"""Tests for table relationship metadata."""
+from pathlib import Path
+from agentic_data_contracts.core.contract import DataContract
+from agentic_data_contracts.core.schema import (
+    AllowedTable,
+    DataContractSchema,
+    SemanticConfig,
+)
+from agentic_data_contracts.semantic.cube import CubeSource
+from agentic_data_contracts.semantic.dbt import DbtSource
+from agentic_data_contracts.semantic.yaml_source import YamlSource
+def test_yaml_source_loads_relationships(fixtures_dir: Path) -> None:
+    source = YamlSource(fixtures_dir / "semantic_source.yml")
+    rels = source.get_relationships()
+    assert len(rels) == 1
+    assert rels[0].from_ == "analytics.orders.customer_id"
+    assert rels[0].to == "analytics.customers.id"
+    assert rels[0].type == "many_to_one"
+def test_yaml_source_no_relationships(tmp_path: Path) -> None:
+    (tmp_path / "empty.yml").write_text("metrics: []")
+    source = YamlSource(tmp_path / "empty.yml")
+    assert source.get_relationships() == []
+def test_dbt_source_returns_empty_relationships(
+    fixtures_dir: Path,
+) -> None:
+    source = DbtSource(fixtures_dir / "sample_dbt_manifest.json")
+    assert source.get_relationships() == []
+def test_cube_source_returns_empty_relationships(
+    fixtures_dir: Path,
+) -> None:
+    source = CubeSource(fixtures_dir / "sample_cube_schema.yml")
+    assert source.get_relationships() == []
+def test_system_prompt_includes_relationships(
+    fixtures_dir: Path,
+) -> None:
+    source = YamlSource(fixtures_dir / "semantic_source.yml")
+    schema = DataContractSchema(
+        name="test",
+        semantic=SemanticConfig(
+            allowed_tables=[
+                AllowedTable.model_validate(
+                    {"schema": "analytics", "tables": ["orders", "customers"]}
+                ),
+            ],
+        ),
+    )
+    dc = DataContract(schema)
+    prompt = dc.to_system_prompt(semantic_source=source)
+    assert "Table Relationships" in prompt
+    assert "analytics.orders.customer_id" in prompt
+    assert "analytics.customers.id" in prompt
+    assert "many_to_one" in prompt
+def test_system_prompt_no_relationships_when_empty(
+    fixtures_dir: Path,
+) -> None:
+    source = DbtSource(fixtures_dir / "sample_dbt_manifest.json")
+    schema = DataContractSchema(
+        name="test",
+        semantic=SemanticConfig(
+            allowed_tables=[
+                AllowedTable.model_validate(
+                    {"schema": "analytics", "tables": ["orders"]}
+                ),
+            ],
+        ),
+    )
+    dc = DataContract(schema)
+    prompt = dc.to_system_prompt(semantic_source=source)
+    assert "Table Relationships" not in prompt

{agentic_data_contracts-0.2.4 → agentic_data_contracts-0.2.6}/tests/test_semantic/test_yaml_source.py RENAMED Viewed

@@ -39,7 +39,7 @@ def test_get_metric_not_found(source: YamlSource) -> None:
 def test_get_table_schema(source: YamlSource) -> None:
     schema = source.get_table_schema("analytics", "orders")
     assert schema is not None
-    assert len(schema.columns) == 4
+    assert len(schema.columns) == 5
     col_names = [c.name for c in schema.columns]
     assert "id" in col_names
     assert "amount" in col_names

agentic_data_contracts-0.2.6/tests/test_tools/test_pagination.py ADDED Viewed

@@ -0,0 +1,80 @@
+"""Tests for list_tables pagination."""
+import json
+import pytest
+from agentic_data_contracts.core.contract import DataContract
+from agentic_data_contracts.core.schema import (
+    AllowedTable,
+    DataContractSchema,
+    SemanticConfig,
+)
+from agentic_data_contracts.tools.factory import create_tools
+@pytest.fixture
+def large_contract() -> DataContract:
+    """Contract with many tables to test pagination."""
+    tables = [f"table_{i}" for i in range(60)]
+    schema = DataContractSchema(
+        name="test",
+        semantic=SemanticConfig(
+            allowed_tables=[
+                AllowedTable.model_validate({"schema": "analytics", "tables": tables}),
+            ],
+        ),
+    )
+    return DataContract(schema)
+@pytest.mark.asyncio
+async def test_list_tables_default_limit(
+    large_contract: DataContract,
+) -> None:
+    tools = create_tools(large_contract)
+    tool = next(t for t in tools if t.name == "list_tables")
+    result = await tool.callable({})
+    data = json.loads(result["content"][0]["text"])
+    assert len(data["tables"]) == 50  # default limit
+    assert data["total"] == 60
+    assert data["next_offset"] == 50
+@pytest.mark.asyncio
+async def test_list_tables_custom_limit(
+    large_contract: DataContract,
+) -> None:
+    tools = create_tools(large_contract)
+    tool = next(t for t in tools if t.name == "list_tables")
+    result = await tool.callable({"limit": 10})
+    data = json.loads(result["content"][0]["text"])
+    assert len(data["tables"]) == 10
+    assert data["total"] == 60
+    assert data["next_offset"] == 10
+@pytest.mark.asyncio
+async def test_list_tables_with_offset(
+    large_contract: DataContract,
+) -> None:
+    tools = create_tools(large_contract)
+    tool = next(t for t in tools if t.name == "list_tables")
+    result = await tool.callable({"limit": 10, "offset": 50})
+    data = json.loads(result["content"][0]["text"])
+    assert len(data["tables"]) == 10
+    assert data["total"] == 60
+    assert "next_offset" not in data  # last page
+@pytest.mark.asyncio
+async def test_list_tables_small_set_no_next(
+    fixtures_dir,
+) -> None:
+    dc = DataContract.from_yaml(fixtures_dir / "minimal_contract.yml")
+    tools = create_tools(dc)
+    tool = next(t for t in tools if t.name == "list_tables")
+    result = await tool.callable({})
+    data = json.loads(result["content"][0]["text"])
+    assert data["total"] == 1
+    assert "next_offset" not in data

{agentic_data_contracts-0.2.4 → agentic_data_contracts-0.2.6}/uv.lock RENAMED Viewed

@@ -9,7 +9,7 @@ resolution-markers = [
 [[package]]
 name = "agentic-data-contracts"
-version = "0.2.4"
+version = "0.2.6"
 source = { editable = "." }
 dependencies = [
     { name = "pydantic" },

agentic_data_contracts-0.2.4/examples/revenue_agent/semantic.yml DELETED Viewed

@@ -1,51 +0,0 @@
-metrics:
-  - name: total_revenue
-    description: "Total revenue from completed orders"
-    sql_expression: "SUM(amount) FILTER (WHERE status = 'completed')"
-    source_model: analytics.orders
-    filters:
-      - "status = 'completed'"
-  - name: revenue_by_region
-    description: "Revenue broken down by customer region"
-    sql_expression: "SUM(o.amount) GROUP BY c.region"
-    source_model: analytics.orders
-    filters:
-      - "o.status = 'completed'"
-tables:
-  - schema: analytics
-    table: orders
-    columns:
-      - name: id
-        type: INTEGER
-        description: "Order ID"
-      - name: customer_id
-        type: INTEGER
-        description: "FK to customers"
-      - name: amount
-        type: DECIMAL
-        description: "Order total in USD"
-      - name: status
-        type: VARCHAR
-        description: "pending, completed, cancelled"
-      - name: tenant_id
-        type: VARCHAR
-        description: "Tenant identifier"
-      - name: created_at
-        type: DATE
-        description: "Order date"
-  - schema: analytics
-    table: customers
-    columns:
-      - name: id
-        type: INTEGER
-        description: "Customer ID"
-      - name: name
-        type: VARCHAR
-        description: "Customer name"
-      - name: region
-        type: VARCHAR
-        description: "Geographic region"
-      - name: tenant_id
-        type: VARCHAR
-        description: "Tenant identifier"