PyPI - aetherdialect - Versions diffs - 0.1.1__tar.gz → 0.1.2__tar.gz - Mend

aetherdialect 0.1.1tar.gz → 0.1.2tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (70) hide show

{aetherdialect-0.1.1/src/aetherdialect.egg-info → aetherdialect-0.1.2}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: aetherdialect
-Version: 0.1.1
+Version: 0.1.2
 Summary: Deterministic, validation-first Text-to-SQL system for business databases
 Author-email: Akul Ameya <akul.ameya@gmail.com>
 License: MIT
@@ -36,6 +36,8 @@ Dynamic: license-file
 # Deterministic, validation-first Text-to-SQL for business databases
+Questions resolve more reliably when you state analytical intent explicitly—entities, grain, filters, time scope, and ordering—instead of leaving those details implied.
 ## Installation
 ```bash
@@ -47,11 +49,11 @@ pip install "text2sql[postgresql,databricks]"
 Requires Python ≥ 3.10 and either an [OpenAI API key](https://platform.openai.com/api-keys) or Azure OpenAI credentials.
-| Extra        | Brings in                                                                 | Use when              |
-| ------------ | ------------------------------------------------------------------------- | --------------------- |
-| (base)       | **SQLAlchemy** (shared introspection / execution interface)               | Always installed      |
-| `postgresql` | PostgreSQL driver (`psycopg2-binary`), **`pglast`**                       | `engine="postgresql"` |
-| `databricks` | PySpark, Databricks SQL connector, **`databricks-sqlalchemy`**, `sqlglot` | `engine="databricks"` |
+| Extra        | Brings in                                                                                        | Use when              |
+| ------------ | ------------------------------------------------------------------------------------------------ | --------------------- |
+| (base)       | **SQLAlchemy** (shared introspection / execution interface)                                      | Always installed      |
+| `postgresql` | PostgreSQL driver (`psycopg2-binary`), **`pglast`**                                              | `engine="postgresql"` |
+| `databricks` | Databricks SQL connector (preferred), PySpark (fallback), **`databricks-sqlalchemy`**, `sqlglot` | `engine="databricks"` |
 **SQL parsing for validation:** PostgreSQL uses **`pglast`** for structural AST checks (join pairs, CTE bodies, `ast_validate`). Databricks / Spark SQL uses **`sqlglot`** with the **Spark** dialect.
@@ -73,7 +75,9 @@ t2s = Text2SQL(
 t2s.run_interactive()
 ```
-Constructor options, modes, optional files, and the full method list are in **[USAGE.md](USAGE.md)**.
+Constructor options, credentials (`set_openai_api_key`, `set_azure_openai_api_key`, `set_env`), modes, and the full API are in **[USAGE.md](USAGE.md)**. Pass **`artifacts_dir=`** to put the per-connection cache under a root you choose; otherwise it lives under the platform user-data directory (see USAGE.md).
+**Interactive two ways:** **`run_interactive()`** is a stdin loop. For your own UI or protocol, use **`Text2SQL.pipeline_session()`** and drive **`PipelineSession`** step by step: one natural-language question is a **turn** that may return several **`SessionEvent`** objects (prompt / answer / …) until **`done`** is true. Types and methods are documented in **[USAGE.md](USAGE.md)**.
 ---
@@ -111,7 +115,7 @@ A **validation-first** text-to-SQL layer for **PostgreSQL** and **Databricks**.
 - Optional **`CREATE TABLE` file** as extra or fallback (especially when you cannot reach all metadata from the driver).
 - Cached schema snapshot per connection fingerprint so restarts avoid re-reflecting unchanged databases.
 - **Table roles** (e.g. fact vs dimension), **column roles** (measure, categorical, temporal, identifier, etc.), **filter / aggregation / HAVING** allowances per column, **value domains** from profiling — all assigned when the graph is built (reflection, DDL, profiling, and optional notes).
-- Optional **human notes** (plain text), via `Text2SQL(..., notes_file=...)` (see **[USAGE.md](USAGE.md)**): read once when the schema graph is **first** created (not on cache load). The LLM uses them to refine **table and column roles**, natural-language **descriptions**, and optional **sensitivity** labels — without renaming tables or inventing columns.
+- Optional **human notes** (plain text), via `Text2SQL(..., notes_file=...)` (see **[USAGE.md](USAGE.md)**): merged when the graph is built or when notes change; if the cache already contains notes and you omit `notes_file` on a later run, cached roles and hints are kept.
 - Optional **deny lists** for tables or columns so they stay out of prompts and can be stripped from intents.
 **Intent / SQL shape (analytical subset)**
@@ -130,7 +134,7 @@ A **validation-first** text-to-SQL layer for **PostgreSQL** and **Databricks**.
 **Operational modes**
-- **Interactive** — ask questions, accept/reject, results export.
+- **Interactive** — ask questions, accept/reject, results export; via **`run_interactive()`** or a programmatic **`PipelineSession`** (see Quickstart above and **[USAGE.md](USAGE.md)**).
 - **Coverage simulator** — seed questions → gold intents → **deterministic expansion** (many operators, deduplicated) → validate/execute → NL question generation for new templates.
 - **QSim** — reproducible synthetic questions from schema and profiles (seeded randomness).
@@ -178,7 +182,7 @@ A **validation-first** text-to-SQL layer for **PostgreSQL** and **Databricks**.
 3. Resolve joins once per table set where possible; validate and **execute** as a gate.
 4. One LLM step to produce a **natural language question** from the SQL, with a **realism** filter.
-The simulator loads at most **500** seed lines per run (fixed internal cap; larger files are truncated). **`estimate_simulator_costs`** uses the same loader, so its seed count reflects that cap. It returns **rough** LLM-call and execution estimates from a seed file and schema stats.
+The simulator loads at most **500** seed lines per run (fixed internal cap; larger files are truncated). **`estimate_simulator_costs`** uses the same loader, prints **rough** per-phase upper-bound lines to stdout, and returns **`None`**.
 ---

{aetherdialect-0.1.1 → aetherdialect-0.1.2}/README.md RENAMED Viewed

@@ -1,5 +1,7 @@
 # Deterministic, validation-first Text-to-SQL for business databases
+Questions resolve more reliably when you state analytical intent explicitly—entities, grain, filters, time scope, and ordering—instead of leaving those details implied.
 ## Installation
 ```bash
@@ -11,11 +13,11 @@ pip install "text2sql[postgresql,databricks]"
 Requires Python ≥ 3.10 and either an [OpenAI API key](https://platform.openai.com/api-keys) or Azure OpenAI credentials.
-| Extra        | Brings in                                                                 | Use when              |
-| ------------ | ------------------------------------------------------------------------- | --------------------- |
-| (base)       | **SQLAlchemy** (shared introspection / execution interface)               | Always installed      |
-| `postgresql` | PostgreSQL driver (`psycopg2-binary`), **`pglast`**                       | `engine="postgresql"` |
-| `databricks` | PySpark, Databricks SQL connector, **`databricks-sqlalchemy`**, `sqlglot` | `engine="databricks"` |
+| Extra        | Brings in                                                                                        | Use when              |
+| ------------ | ------------------------------------------------------------------------------------------------ | --------------------- |
+| (base)       | **SQLAlchemy** (shared introspection / execution interface)                                      | Always installed      |
+| `postgresql` | PostgreSQL driver (`psycopg2-binary`), **`pglast`**                                              | `engine="postgresql"` |
+| `databricks` | Databricks SQL connector (preferred), PySpark (fallback), **`databricks-sqlalchemy`**, `sqlglot` | `engine="databricks"` |
 **SQL parsing for validation:** PostgreSQL uses **`pglast`** for structural AST checks (join pairs, CTE bodies, `ast_validate`). Databricks / Spark SQL uses **`sqlglot`** with the **Spark** dialect.
@@ -37,7 +39,9 @@ t2s = Text2SQL(
 t2s.run_interactive()
 ```
-Constructor options, modes, optional files, and the full method list are in **[USAGE.md](USAGE.md)**.
+Constructor options, credentials (`set_openai_api_key`, `set_azure_openai_api_key`, `set_env`), modes, and the full API are in **[USAGE.md](USAGE.md)**. Pass **`artifacts_dir=`** to put the per-connection cache under a root you choose; otherwise it lives under the platform user-data directory (see USAGE.md).
+**Interactive two ways:** **`run_interactive()`** is a stdin loop. For your own UI or protocol, use **`Text2SQL.pipeline_session()`** and drive **`PipelineSession`** step by step: one natural-language question is a **turn** that may return several **`SessionEvent`** objects (prompt / answer / …) until **`done`** is true. Types and methods are documented in **[USAGE.md](USAGE.md)**.
 ---
@@ -75,7 +79,7 @@ A **validation-first** text-to-SQL layer for **PostgreSQL** and **Databricks**.
 - Optional **`CREATE TABLE` file** as extra or fallback (especially when you cannot reach all metadata from the driver).
 - Cached schema snapshot per connection fingerprint so restarts avoid re-reflecting unchanged databases.
 - **Table roles** (e.g. fact vs dimension), **column roles** (measure, categorical, temporal, identifier, etc.), **filter / aggregation / HAVING** allowances per column, **value domains** from profiling — all assigned when the graph is built (reflection, DDL, profiling, and optional notes).
-- Optional **human notes** (plain text), via `Text2SQL(..., notes_file=...)` (see **[USAGE.md](USAGE.md)**): read once when the schema graph is **first** created (not on cache load). The LLM uses them to refine **table and column roles**, natural-language **descriptions**, and optional **sensitivity** labels — without renaming tables or inventing columns.
+- Optional **human notes** (plain text), via `Text2SQL(..., notes_file=...)` (see **[USAGE.md](USAGE.md)**): merged when the graph is built or when notes change; if the cache already contains notes and you omit `notes_file` on a later run, cached roles and hints are kept.
 - Optional **deny lists** for tables or columns so they stay out of prompts and can be stripped from intents.
 **Intent / SQL shape (analytical subset)**
@@ -94,7 +98,7 @@ A **validation-first** text-to-SQL layer for **PostgreSQL** and **Databricks**.
 **Operational modes**
-- **Interactive** — ask questions, accept/reject, results export.
+- **Interactive** — ask questions, accept/reject, results export; via **`run_interactive()`** or a programmatic **`PipelineSession`** (see Quickstart above and **[USAGE.md](USAGE.md)**).
 - **Coverage simulator** — seed questions → gold intents → **deterministic expansion** (many operators, deduplicated) → validate/execute → NL question generation for new templates.
 - **QSim** — reproducible synthetic questions from schema and profiles (seeded randomness).
@@ -142,7 +146,7 @@ A **validation-first** text-to-SQL layer for **PostgreSQL** and **Databricks**.
 3. Resolve joins once per table set where possible; validate and **execute** as a gate.
 4. One LLM step to produce a **natural language question** from the SQL, with a **realism** filter.
-The simulator loads at most **500** seed lines per run (fixed internal cap; larger files are truncated). **`estimate_simulator_costs`** uses the same loader, so its seed count reflects that cap. It returns **rough** LLM-call and execution estimates from a seed file and schema stats.
+The simulator loads at most **500** seed lines per run (fixed internal cap; larger files are truncated). **`estimate_simulator_costs`** uses the same loader, prints **rough** per-phase upper-bound lines to stdout, and returns **`None`**.
 ---

{aetherdialect-0.1.1 → aetherdialect-0.1.2}/pyproject.toml RENAMED Viewed

@@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"
 [project]
 name = "aetherdialect"
-version = "0.1.1"
+version = "0.1.2"
 description = "Deterministic, validation-first Text-to-SQL system for business databases"
 readme = "README.md"
 requires-python = ">=3.10"

{aetherdialect-0.1.1 → aetherdialect-0.1.2/src/aetherdialect.egg-info}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: aetherdialect
-Version: 0.1.1
+Version: 0.1.2
 Summary: Deterministic, validation-first Text-to-SQL system for business databases
 Author-email: Akul Ameya <akul.ameya@gmail.com>
 License: MIT
@@ -36,6 +36,8 @@ Dynamic: license-file
 # Deterministic, validation-first Text-to-SQL for business databases
+Questions resolve more reliably when you state analytical intent explicitly—entities, grain, filters, time scope, and ordering—instead of leaving those details implied.
 ## Installation
 ```bash
@@ -47,11 +49,11 @@ pip install "text2sql[postgresql,databricks]"
 Requires Python ≥ 3.10 and either an [OpenAI API key](https://platform.openai.com/api-keys) or Azure OpenAI credentials.
-| Extra        | Brings in                                                                 | Use when              |
-| ------------ | ------------------------------------------------------------------------- | --------------------- |
-| (base)       | **SQLAlchemy** (shared introspection / execution interface)               | Always installed      |
-| `postgresql` | PostgreSQL driver (`psycopg2-binary`), **`pglast`**                       | `engine="postgresql"` |
-| `databricks` | PySpark, Databricks SQL connector, **`databricks-sqlalchemy`**, `sqlglot` | `engine="databricks"` |
+| Extra        | Brings in                                                                                        | Use when              |
+| ------------ | ------------------------------------------------------------------------------------------------ | --------------------- |
+| (base)       | **SQLAlchemy** (shared introspection / execution interface)                                      | Always installed      |
+| `postgresql` | PostgreSQL driver (`psycopg2-binary`), **`pglast`**                                              | `engine="postgresql"` |
+| `databricks` | Databricks SQL connector (preferred), PySpark (fallback), **`databricks-sqlalchemy`**, `sqlglot` | `engine="databricks"` |
 **SQL parsing for validation:** PostgreSQL uses **`pglast`** for structural AST checks (join pairs, CTE bodies, `ast_validate`). Databricks / Spark SQL uses **`sqlglot`** with the **Spark** dialect.
@@ -73,7 +75,9 @@ t2s = Text2SQL(
 t2s.run_interactive()
 ```
-Constructor options, modes, optional files, and the full method list are in **[USAGE.md](USAGE.md)**.
+Constructor options, credentials (`set_openai_api_key`, `set_azure_openai_api_key`, `set_env`), modes, and the full API are in **[USAGE.md](USAGE.md)**. Pass **`artifacts_dir=`** to put the per-connection cache under a root you choose; otherwise it lives under the platform user-data directory (see USAGE.md).
+**Interactive two ways:** **`run_interactive()`** is a stdin loop. For your own UI or protocol, use **`Text2SQL.pipeline_session()`** and drive **`PipelineSession`** step by step: one natural-language question is a **turn** that may return several **`SessionEvent`** objects (prompt / answer / …) until **`done`** is true. Types and methods are documented in **[USAGE.md](USAGE.md)**.
 ---
@@ -111,7 +115,7 @@ A **validation-first** text-to-SQL layer for **PostgreSQL** and **Databricks**.
 - Optional **`CREATE TABLE` file** as extra or fallback (especially when you cannot reach all metadata from the driver).
 - Cached schema snapshot per connection fingerprint so restarts avoid re-reflecting unchanged databases.
 - **Table roles** (e.g. fact vs dimension), **column roles** (measure, categorical, temporal, identifier, etc.), **filter / aggregation / HAVING** allowances per column, **value domains** from profiling — all assigned when the graph is built (reflection, DDL, profiling, and optional notes).
-- Optional **human notes** (plain text), via `Text2SQL(..., notes_file=...)` (see **[USAGE.md](USAGE.md)**): read once when the schema graph is **first** created (not on cache load). The LLM uses them to refine **table and column roles**, natural-language **descriptions**, and optional **sensitivity** labels — without renaming tables or inventing columns.
+- Optional **human notes** (plain text), via `Text2SQL(..., notes_file=...)` (see **[USAGE.md](USAGE.md)**): merged when the graph is built or when notes change; if the cache already contains notes and you omit `notes_file` on a later run, cached roles and hints are kept.
 - Optional **deny lists** for tables or columns so they stay out of prompts and can be stripped from intents.
 **Intent / SQL shape (analytical subset)**
@@ -130,7 +134,7 @@ A **validation-first** text-to-SQL layer for **PostgreSQL** and **Databricks**.
 **Operational modes**
-- **Interactive** — ask questions, accept/reject, results export.
+- **Interactive** — ask questions, accept/reject, results export; via **`run_interactive()`** or a programmatic **`PipelineSession`** (see Quickstart above and **[USAGE.md](USAGE.md)**).
 - **Coverage simulator** — seed questions → gold intents → **deterministic expansion** (many operators, deduplicated) → validate/execute → NL question generation for new templates.
 - **QSim** — reproducible synthetic questions from schema and profiles (seeded randomness).
@@ -178,7 +182,7 @@ A **validation-first** text-to-SQL layer for **PostgreSQL** and **Databricks**.
 3. Resolve joins once per table set where possible; validate and **execute** as a gate.
 4. One LLM step to produce a **natural language question** from the SQL, with a **realism** filter.
-The simulator loads at most **500** seed lines per run (fixed internal cap; larger files are truncated). **`estimate_simulator_costs`** uses the same loader, so its seed count reflects that cap. It returns **rough** LLM-call and execution estimates from a seed file and schema stats.
+The simulator loads at most **500** seed lines per run (fixed internal cap; larger files are truncated). **`estimate_simulator_costs`** uses the same loader, prints **rough** per-phase upper-bound lines to stdout, and returns **`None`**.
 ---

{aetherdialect-0.1.1 → aetherdialect-0.1.2}/src/aetherdialect.egg-info/SOURCES.txt RENAMED Viewed

@@ -46,6 +46,7 @@ tests/test_intent_resolve.py
 tests/test_join_bool_cte_matrix.py
 tests/test_live_testing.py
 tests/test_main_execution.py
+tests/test_pipeline_session.py
 tests/test_pipeline_targeted.py
 tests/test_pipeline_units.py
 tests/test_qsim_ops.py

{aetherdialect-0.1.1 → aetherdialect-0.1.2}/src/text2sql/__init__.py RENAMED Viewed

@@ -1,10 +1,10 @@
-from importlib.metadata import PackageNotFoundError, version
-from .text2sql import Text2SQL
-try:
-    __version__ = version("aetherdialect")
-except PackageNotFoundError:
-    __version__ = "0.0.0+dev"
-__all__ = ["Text2SQL", "__version__"]
+from importlib.metadata import PackageNotFoundError, version
+from .text2sql import Text2SQL
+try:
+    __version__ = version("aetherdialect")
+except PackageNotFoundError:
+    __version__ = "0.0.0+dev"
+__all__ = ["Text2SQL", "__version__"]

{aetherdialect-0.1.1 → aetherdialect-0.1.2}/src/text2sql/config.py RENAMED Viewed

@@ -5,11 +5,13 @@ from __future__ import annotations
 import os
 import re
 from enum import Enum
-from typing import ClassVar, Protocol, runtime_checkable
+from typing import Any, ClassVar, Protocol, runtime_checkable
 from urllib.parse import quote
 SUPPORTED_ENGINES: frozenset[str] = frozenset({"postgresql", "databricks"})
+JSON_COMPACT_SEPARATORS: tuple[str, str] = (",", ":")
 @runtime_checkable
 class RuntimeConfig(Protocol):
@@ -117,6 +119,7 @@ _AGGREGATION_FUNCTION_NAMES_ORDERED: tuple[str, ...] = (
     "max",
 )
 VALID_AGGREGATION_FUNCTIONS = frozenset(_AGGREGATION_FUNCTION_NAMES_ORDERED)
+SELECTABILITY_AGG_FUNCS: frozenset[str] = VALID_AGGREGATION_FUNCTIONS
 SQL_AGG_FUNC_CALL_RE = re.compile(
     r"\b(?:count|sum|avg|min|max)\s*\(",
     re.IGNORECASE,
@@ -421,6 +424,20 @@ BOOLEAN_TRUE_FALSE_MAP: dict[frozenset[str], tuple[str, str]] = {
     frozenset(["active", "inactive"]): ("active", "inactive"),
     frozenset(["enabled", "disabled"]): ("enabled", "disabled"),
 }
+BOOLEAN_NEGATION_PREFIXES: tuple[str, ...] = ("no ", "not ", "non ", "non-", "un", "in")
+BOOLEAN_NEGATION_SUFFIXES: tuple[str, ...] = ()
+BOOLEAN_ANTONYM_MIN_STEM_LEN: int = 3
+BOOLEAN_AFFIRMATIVE_STRIP_PREFIXES: tuple[str, ...] = ("a ", "an ")
+ARTIFACT_FORMAT_VERSION: int = 2
+DESTRUCTIVE_REBUILD_ON_FORMAT_MISMATCH: bool = False
+FAILURE_HINT_MAX_RECORDS: int = 200
+FAILURE_HINT_MAX_CHARS_PER_RECORD: int = 500
+FAILURE_HINT_MAX_MESSAGES: int = 5
+FAILURE_HINT_MAX_INJECT_CHARS: int = 1200
+FAILURE_HINT_FUZZY: bool = False
 NUMERIC_TYPE_TOKENS = frozenset(
     {
         "int",
@@ -778,11 +795,11 @@ def normalize_value_type(value_type: str) -> str:
     Args:
-    value_type: LLM or schema value type.
+        value_type: LLM or schema value type.
     Returns:
-    Normalised name from `VALUE_TYPE_NORMALIZATION` / `VALID_VALUE_TYPES`, else `'string'`.
+        Normalised name from `VALUE_TYPE_NORMALIZATION` / `VALID_VALUE_TYPES`, else `'string'`.
     """
     if not value_type:
         return "string"
@@ -800,11 +817,11 @@ def normalize_column_type(col_type: str) -> str:
     Args:
-    col_type: Raw SQL type (e.g. `VARCHAR(255)`).
+        col_type: Raw SQL type (e.g. `VARCHAR(255)`).
     Returns:
-    Base type name for lookup tables.
+        Base type name for lookup tables.
     """
     normalized = col_type.lower().strip()
     normalized = re.sub(r"\(\d+(?:,\s*\d+)?\)", "", normalized)
@@ -1023,11 +1040,11 @@ class PostgresRuntimeConfig:
         Returns:
-        SQLAlchemy connection URL.
+            SQLAlchemy connection URL.
         Raises:
-        ValueError: If `PASSWORD` or `DATABASE` is unset.
+            ValueError: If `PASSWORD` or `DATABASE` is unset.
         """
         if not cls.PASSWORD:
             raise ValueError("PostgreSQL password required")
@@ -1057,7 +1074,7 @@ class DatabricksRuntimeConfig:
         Returns:
-        Whether `databricks-sql-connector` can be used.
+            Whether `databricks-sql-connector` can be used.
         """
         return bool(cls.SERVER_HOSTNAME and cls.HTTP_PATH and cls.ACCESS_TOKEN)
@@ -1068,11 +1085,11 @@ class DatabricksRuntimeConfig:
         Returns:
-        None.
+            None.
         Raises:
-        ValueError: If either identifier is missing.
+            ValueError: If either identifier is missing.
         """
         if not cls.CATALOG:
             raise ValueError("Databricks catalog required")
@@ -1086,7 +1103,7 @@ class DatabricksRuntimeConfig:
         Returns:
-        URL string, or ``None`` when native warehouse credentials are not configured.
+            URL string, or ``None`` when native warehouse credentials are not configured.
         """
         if not cls.has_native_connection():
             return None
@@ -1117,8 +1134,8 @@ class EngineConfig:
     AZURE_OPENAI_BASE_URL: ClassVar[str | None] = os.environ.get("AZURE_OPENAI_BASE_URL")
     AZURE_OPENAI_ENDPOINT: ClassVar[str | None] = os.environ.get("AZURE_OPENAI_ENDPOINT")
-    SCHEMA_JSON_PATH: ClassVar[str] = "schema_graph.json"
-    TEMPLATE_JSON_PATH: ClassVar[str] = "intent_templates.json"
+    SCHEMA_JSON_PATH: ClassVar[str] = "schema_graph.json.gz"
+    TEMPLATE_JSON_PATH: ClassVar[str] = "intent_templates.json.gz"
     @classmethod
     def azure_base_url(cls) -> str | None:
@@ -1171,12 +1188,86 @@ class QSimConfig:
     EXCLUDED_FILTER_PATTERNS = EXCLUDED_FILTER_PATTERNS
-    SKELETONS_JSON_PATH = "qsim_skeletons.json"
-    QUESTIONS_OUTPUT_PATH = "qsim_intents_with_questions.json"
+    SKELETONS_JSON_PATH = "qsim_skeletons.json.gz"
     MAX_ROLE_CLASSIFICATION_RETRIES = 2
+class ExpansionOperatorId:
+    """Stable expansion operator ids; registry keys and expansion metadata stamps."""
+    FILTER_ADD = "FILTER_ADD"
+    FILTER_EXPR_ADD = "FILTER_EXPR_ADD"
+    AGG_CHANGE = "AGG_CHANGE"
+    GROUPBY_ADD = "GROUPBY_ADD"
+    ORDERBY_ADD = "ORDERBY_ADD"
+    HAVING_VALUE_ADD = "HAVING_VALUE_ADD"
+    HAVING_EXPR_ADD = "HAVING_EXPR_ADD"
+    FILTER_REMOVE = "FILTER_REMOVE"
+    GROUPBY_REMOVE = "GROUPBY_REMOVE"
+    HAVING_REMOVE = "HAVING_REMOVE"
+    JOIN_DIMENSION_ADD = "JOIN_DIMENSION_ADD"
+    JOIN_FACT_ADD = "JOIN_FACT_ADD"
+    DIMENSION_SWAP = "DIMENSION_SWAP"
+    TABLE_REMOVE = "TABLE_REMOVE"
+    BRIDGE_INTERMEDIATE_ADD = "BRIDGE_INTERMEDIATE_ADD"
+    INCLUDE_GOLD = "INCLUDE_GOLD"
+    TEMP_EXTRACT_GROUPBY = "TEMP_EXTRACT_GROUPBY"
+    TEMP_DATE_TRUNC_GROUPBY = "TEMP_DATE_TRUNC_GROUPBY"
+    TEMP_DATE_WINDOW_FILTER = "TEMP_DATE_WINDOW_FILTER"
+    TEMP_DATE_DIFF_FILTER = "TEMP_DATE_DIFF_FILTER"
+    NUM_ROUND_SELECT = "NUM_ROUND_SELECT"
+    NUM_ABS_FILTER = "NUM_ABS_FILTER"
+    DISTINCT_ADD = "DISTINCT_ADD"
+    LIMIT_ADD = "LIMIT_ADD"
+    FILTER_OR_GROUP = "FILTER_OR_GROUP"
+    SELECT_EXPR_PAIR_MULTIPLY = "SELECT_EXPR_PAIR_MULTIPLY"
+    WINDOW_RANK_ADD = "WINDOW_RANK_ADD"
+    WINDOW_SUM_PARTITION_ADD = "WINDOW_SUM_PARTITION_ADD"
+    SELECT_CASE_LABEL_ADD = "SELECT_CASE_LABEL_ADD"
+    WINDOW_LAG_ADD = "WINDOW_LAG_ADD"
+    WINDOW_LEAD_ADD = "WINDOW_LEAD_ADD"
+    FILTER_ILIKE_ADD = "FILTER_ILIKE_ADD"
+    FILTER_ARRAY_CONTAINS_ADD = "FILTER_ARRAY_CONTAINS_ADD"
+    ORDERBY_REMOVE = "ORDERBY_REMOVE"
+    LIMIT_REMOVE = "LIMIT_REMOVE"
+    SELECT_COL_TRIM = "SELECT_COL_TRIM"
+    WINDOW_STRIP = "WINDOW_STRIP"
+    DISTINCT_REMOVE = "DISTINCT_REMOVE"
+SEED_NORMALIZATION_BATCH_SIZE: int = 20
+INTERACTIVE_STAGE_DIRECT_REUSE = "direct_reuse_confirm"
+INTERACTIVE_STAGE_INTENT_WARNINGS = "intent_semantic_warnings"
+INTERACTIVE_STAGE_INTENT_CONFIRM = "intent_confirm"
+INTERACTIVE_STAGE_SCHEMA_INVALID = "intent_schema_invalid_continue"
+INTERACTIVE_STAGE_HARD_BLOCK = "hard_block_override"
+INTERACTIVE_STAGE_SQL_FEEDBACK = "sql_result_confirm"
+PIPELINE_SUSPEND_ID_DIRECT_REUSE = "awaiting_direct_reuse_confirmation"
+PIPELINE_SUSPEND_ID_INTENT_WARNINGS = "awaiting_intent_semantic_continue"
+PIPELINE_SUSPEND_ID_INTENT_CONFIRM = "awaiting_intent_confirmation"
+PIPELINE_SUSPEND_ID_HARD_BLOCK = "awaiting_hard_block_override"
+PIPELINE_SUSPEND_ID_SQL = "awaiting_sql_result_confirmation"
+PIPELINE_SUSPEND_ID_SCHEMA_INVALID = "awaiting_schema_invalid_continuation"
+SEED_NORMALIZATION_JSON = "seed_question_normalization.json"
+NORMALIZED_SEEDS_TXT = "seed_questions_normalized.txt"
+QSIM_QUESTIONS_PATTERN = "qsim_questions_v{version}.txt"
+REALISM_DROP_REASON_CATEGORIES: frozenset[str] = frozenset(
+    {
+        "nonsensical_sql",
+        "tautology",
+        "overly_narrow_filter",
+        "pii_smell",
+        "unmeasurable_metric",
+        "other",
+    }
+)
 class SimulatorConfig:
     """Simulator expansion depth, output filename patterns, and date/limit presets."""
@@ -1190,10 +1281,8 @@ class SimulatorConfig:
     MAX_HAVING_CONDITIONS = 2
     MAX_EXPANSION_DEPTH = 2
-    GOLD_OUTPUT_PATTERN = "gold_intents_v{version}.json"
+    SIMULATOR_BUNDLE_PATTERN = "simulator_v{version}.zip"
     REPORT_PATTERN = "simulation_report_v{version}.json"
-    RESULTS_CSV_PATTERN = "simulation_results_v{version}.csv"
-    FAILURES_PATTERN = "simulation_failures_v{version}.json"
     RANDOM_SEED = DEFAULT_RANDOM_SEED
@@ -1217,3 +1306,6 @@ class SimulatorConfig:
         {"unit": "day", "amount": 30},
         {"unit": "day", "amount": 90},
     ]
+EMPTY_JOIN_CANDIDATES: dict[str, Any] = {"candidates": []}

aetherdialect 0.1.1__tar.gz → 0.1.2__tar.gz

aetherdialect 0.1.1tar.gz → 0.1.2tar.gz