PyPI - aetherdialect - Versions diffs - 0.1.3__tar.gz → 0.1.4__tar.gz - Mend

aetherdialect 0.1.3tar.gz → 0.1.4tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (81) hide show

{aetherdialect-0.1.3/src/aetherdialect.egg-info → aetherdialect-0.1.4}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: aetherdialect
-Version: 0.1.3
+Version: 0.1.4
 Summary: Deterministic, validation-first Text-to-SQL system for business databases
 Author-email: Akul Ameya <akul.ameya@gmail.com>
 License: MIT
@@ -8,6 +8,7 @@ Project-URL: Homepage, https://github.com/akul-ameya/aetherdialect
 Requires-Python: >=3.10
 Description-Content-Type: text/markdown
 License-File: LICENSE
+Requires-Dist: pandas<3,>=2.0
 Requires-Dist: packaging<25,>=23.0
 Requires-Dist: jsonschema<5,>=4.0
 Requires-Dist: openai<3,>=2.0.0
@@ -38,9 +39,9 @@ Dynamic: license-file
 This library turns **analytical questions** into **read-only `SELECT`** pipelines on **PostgreSQL** or **Databricks**: structured intent, heavy validation (including dialect AST and `EXPLAIN`), optional **template reuse** from accepted answers, and **negative memory** from rejections. When you construct **`Text2SQL`**, it checks **database connectivity**, **LLM reachability**, and whether **on-disk artifacts** still match the live schema.
-**Practical tips:** Questions resolve more reliably when you state intent explicitly—entities, grain, filters, time scope, and ordering—instead of leaving those details implied. The same goes for optional domain notes (`SchemaContext.notes_path`, see **[API_REFERENCE.md](API_REFERENCE.md)**): richer notes and clearer questions generally improve routing speed and SQL quality.
+**Practical tips:** Questions resolve more reliably when you state intent explicitly—entities, grain, filters, time scope, and ordering—instead of leaving those details implied. The same goes for optional domain notes (`SchemaContext.notes_file`, see **[API_REFERENCE.md](https://github.com/akul-ameya/aetherdialect/blob/main/API_REFERENCE.md)**): richer notes and clearer questions generally improve routing speed and SQL quality.
-**Internals:** for how the schema graph is built, how columns become visible to the LLM, how joins are picked, and how stored artifacts migrate when the schema drifts, see **[OVERVIEW.md](OVERVIEW.md)**.
+**Internals:** for how the schema graph is built, how columns become visible to the LLM, how joins are picked, and how stored artifacts migrate when the schema drifts, see **[OVERVIEW.md](https://github.com/akul-ameya/aetherdialect/blob/main/OVERVIEW.md)**.
 ## Installation
@@ -51,13 +52,13 @@ pip install "aetherdialect[databricks]"
 pip install "aetherdialect[postgresql,databricks]"
 ```
-Requires Python ≥ 3.10 and either an [OpenAI API key](https://platform.openai.com/api-keys) or Azure OpenAI credentials. Construction verifies LLM connectivity for **each distinct** model or deployment the run uses; see **[API_REFERENCE.md](API_REFERENCE.md)** for required variables, optional deployment-name overrides on Azure, and Databricks SQL warehouse vs PySpark.
+Requires Python ≥ 3.10 and either an [OpenAI API key](https://platform.openai.com/api-keys) or Azure OpenAI credentials. Construction verifies LLM connectivity for **each distinct** model or deployment the run uses; see **[API_REFERENCE.md](https://github.com/akul-ameya/aetherdialect/blob/main/API_REFERENCE.md)** for required variables, optional deployment-name overrides on Azure, and Databricks SQL warehouse vs PySpark.
 | Extra        | Brings in                                                                                        | Use when                                                                   |
 | ------------ | ------------------------------------------------------------------------------------------------ | -------------------------------------------------------------------------- |
 | (base)       | **SQLAlchemy** (shared introspection / execution interface)                                      | Always installed                                                           |
-| `postgresql` | PostgreSQL driver (`psycopg2-binary`), **`pglast`**                                              | PostgreSQL via `PG*` / `POSTGRES_*` env (see **API_REFERENCE.md**)         |
-| `databricks` | Databricks SQL connector (preferred), PySpark (fallback), **`databricks-sqlalchemy`**, `sqlglot` | Databricks via `DATABRICKS_*` / related aliases (see **API_REFERENCE.md**) |
+| `postgresql` | PostgreSQL driver (`psycopg2-binary`), **`pglast`**                                              | PostgreSQL via `PG*` / `POSTGRES_*` env (see **[API_REFERENCE.md](https://github.com/akul-ameya/aetherdialect/blob/main/API_REFERENCE.md)**)         |
+| `databricks` | Databricks SQL connector (preferred), PySpark (fallback), **`databricks-sqlalchemy`**, `sqlglot` | Databricks via `DATABRICKS_*` / related aliases (see **[API_REFERENCE.md](https://github.com/akul-ameya/aetherdialect/blob/main/API_REFERENCE.md)**) |
 **SQL parsing for validation:** PostgreSQL uses **`pglast`** for structural AST checks (join pairs, CTE bodies, `ast_validate`). Databricks / Spark SQL uses **`sqlglot`** with the **Spark** dialect.
@@ -77,9 +78,9 @@ t2s = Text2SQL(
 t2s.run_interactive()
 ```
-Set database and LLM variables in the process environment or in **`env_file`**. The full matrix is in **[API_REFERENCE.md](API_REFERENCE.md)**. Pass **`artifacts_dir=`** so artifacts are written under `<root>/text2sql`; when omitted, a platform user-data directory is used.
+Set database and LLM variables in the process environment or in **`env_file`**. The full matrix is in **[API_REFERENCE.md](https://github.com/akul-ameya/aetherdialect/blob/main/API_REFERENCE.md)**. Pass **`artifacts_dir=`** so artifacts are written under `<root>/text2sql`; when omitted, a platform user-data directory is used.
-**Interactive two ways:** **`run_interactive()`** is a stdin loop. For your own UI or protocol, use **`Text2SQL.pipeline_session()`** with **`PipelineSession.ask`** and **`PipelineSession.step`**, which return **`SessionStep`** until **`done`** is true. Details are in **[API_REFERENCE.md](API_REFERENCE.md)**.
+**Interactive two ways:** **`run_interactive()`** is a stdin loop. For your own UI or protocol, use **`Text2SQL.pipeline_session()`** with **`PipelineSession.ask`** and **`PipelineSession.step`**, which return **`SessionStep`** until **`done`** is true. Details are in **[API_REFERENCE.md](https://github.com/akul-ameya/aetherdialect/blob/main/API_REFERENCE.md)**.
 ---
@@ -118,7 +119,7 @@ A **validation-first** layer for **stable business schemas** and **repeated anal
 ## LLM: three fixed models
-The library uses **three** named models internally. On **Azure OpenAI** you expose **three deployments** whose default names match those internal names, or you map each name to your deployment with optional env vars (**[API_REFERENCE.md](API_REFERENCE.md)** lists the exact strings and variables).
+The library uses **three** named models internally. On **Azure OpenAI** you expose **three deployments** whose default names match those internal names, or you map each name to your deployment with optional env vars (**[API_REFERENCE.md](https://github.com/akul-ameya/aetherdialect/blob/main/API_REFERENCE.md)** lists the exact strings and variables).
 When `Text2SQL` is constructed, a **short completion** is sent once per **distinct** configured model name (OpenAI) or deployment name (Azure), so bad keys, endpoints, or deployment maps fail immediately with **`ConfigError`** instead of halfway through a session.
@@ -132,7 +133,7 @@ Treat credentials as you would for any read-only analyst account.
 - The engine needs to **reflect** the tables or views in your **`SchemaContext`**, run **`SELECT`** (and **`EXPLAIN`**) on generated queries, and execute the paths you enable (interactive display, warmup, etc.).
 - **Least privilege** is recommended: a role limited to **`SELECT`** (and whatever your database requires for **`EXPLAIN`**) on the objects you include. The library enforces an analytical **`SELECT`**-only policy in generated SQL, but that is **not** a substitute for database- and network-level security.
-- **Scope** matters: allow/deny lists and `include` settings restrict what is visible; they also feed **fingerprinting** so template stores stay aligned when you change scope (**[API_REFERENCE.md](API_REFERENCE.md)**).
+- **Scope** matters: allow/deny lists and `include` settings restrict what is visible; they also feed **fingerprinting** so template stores stay aligned when you change scope (**[API_REFERENCE.md](https://github.com/akul-ameya/aetherdialect/blob/main/API_REFERENCE.md)**).
 ---
@@ -140,7 +141,7 @@ Treat credentials as you would for any read-only analyst account.
 Everything learned or cached for a connection “shape” lives under **`artifacts_dir`** (see Quickstart): resolved to **`<root>/text2sql`**, or a platform user-data directory if you omit **`artifacts_dir`**. That folder holds the **schema snapshot**, **template store**, **QSim skeletons**, seed-warmup cache, and a small **manifest** of fingerprints—not your raw database.
-Each time **`Text2SQL(...)`** runs, the **live** schema graph is compared to the **stored** manifest. The outcome is one of four **migration tiers** (construction step 6 and the table below; for a conceptual walkthrough see **[OVERVIEW.md § Migration tiers](OVERVIEW.md#5-migration-tiers-what-happens-when-your-schema-changes)**):
+Each time **`Text2SQL(...)`** runs, the **live** schema graph is compared to the **stored** manifest. The outcome is one of four **migration tiers** (construction step 6 and the table below; for a conceptual walkthrough see **[OVERVIEW.md § Migration tiers](https://github.com/akul-ameya/aetherdialect/blob/main/OVERVIEW.md#5-migration-tiers-what-happens-when-your-schema-changes)**):
 | Tier             | What it means for you                                                                                                                                                                                                                                                                                                                                                                                                                                   |
 | ---------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
@@ -161,7 +162,7 @@ After construction, the migration outcome is printed when non-trivial; use **`Te
 ## How to improve results (without touching code)
 - **Schema quality:** **declared foreign keys** in the database, sensible types, and a stable star/snowflake-style layout are ideal. If production metadata is thin, the graph still gains **inferred FK-style links** from naming conventions where those rules apply, plus **semantic join neighbors** from profiled column value overlap—so imperfect warehouses get extra join signal, not only whatever the catalog declared.
-- **Domain language:** optional **notes** file (`SchemaContext.notes_path`) and concrete questions (entities, time range, grain) improve routing and SQL quality; see the opening paragraph of this README.
+- **Domain language:** optional **notes** file (`SchemaContext.notes_file`) and concrete questions (entities, time range, grain) improve routing and SQL quality; see the opening paragraph of this README.
 - **Scope:** use **allow/deny** and **`include`** deliberately so the graph matches how analysts think about the warehouse; changing scope changes fingerprints and can trigger migration.
 - **Operational learning:** accept good SQL; when you reject, answer the **“what was wrong?”** prompt when it appears—the reason improves negative learning (see above). Use **seed warmup** or **QSim** to broaden template coverage in a controlled way.
@@ -191,7 +192,7 @@ After construction, the migration outcome is printed when non-trivial; use **`Te
 - Cached schema snapshot per connection fingerprint so restarts avoid re-reflecting unchanged databases.
 - **Table roles** (e.g. fact vs dimension), **column roles** (measure, categorical, temporal, identifier, etc.), **filter / aggregation / HAVING** allowances per column, **value domains** from profiling — all assigned when the graph is built (reflection, DDL, profiling, and optional notes).
 - Profiling captures the **mode frequency** for each column. When one value occupies ≥99% of the non-null distribution (a sentinel like `0`, `-1`, or `'Unknown'`) the column is hidden from the LLM by the same gate as columns that are ≥99% null — sentinel-dominated columns carry no useful filter or grouping signal.
-- Optional **human notes** (plain text), via **`SchemaContext.notes_path`** (see **[API_REFERENCE.md](API_REFERENCE.md)**): merged when the graph is built or when notes change; if the cache already contains notes and you omit `notes_path` on a later run, cached roles and hints are kept.
+- Optional **human notes** (plain text), via **`SchemaContext.notes_file`** (see **[API_REFERENCE.md](https://github.com/akul-ameya/aetherdialect/blob/main/API_REFERENCE.md)**): merged when the graph is built or when notes change; if the cache already contains notes and you omit `notes_file` on a later run, cached roles and hints are kept.
 - Optional **`deny_columns`** and **`allow_objects`** in **`SchemaContext`**; they participate in **scope hashing** so template stores reconcile when scope changes. Each `deny_columns` entry is either a qualified **`"table.column"`** (denies that exact column) or a bare **`"column"`** name (denies that column name on every table where it appears — qualify if you want one-table scope). Denied columns are hidden from the LLM context and rejected anywhere they would appear in the IR (bare select, filter, `GROUP BY`, `HAVING`, `ORDER BY`, aggregate).
 - Optional **`allow_columns`** in **`SchemaContext`** complements `deny_columns`: when non-empty, only the listed columns survive reflection. Same grammar as `deny_columns` (qualified or bare). **Pragmatic auto-include**: primary key columns and any column appearing in a foreign key edge (source or destination) are always retained so the join graph survives a narrow allow list. Participates in scope hashing.
 - Per-column **`sensitivity`** tag on `ColumnMetadata` accepts `"pii"` or `"restricted"`. Both hide the column from the LLM context. `"pii"` additionally rejects bare select-list projection and `GROUP BY` references; aggregates and equality filters remain available. `"restricted"` hides from the LLM only — IR references that survive other validators are permitted.
@@ -212,7 +213,7 @@ After construction, the migration outcome is printed when non-trivial; use **`Te
 **Operational modes**
-- **Interactive** — ask questions, accept/reject, results export; via **`run_interactive()`** or a programmatic **`PipelineSession`** (see Quickstart above and **[API_REFERENCE.md](API_REFERENCE.md)**).
+- **Interactive** — ask questions, accept/reject, results export; via **`run_interactive()`** or a programmatic **`PipelineSession`** (see Quickstart above and **[API_REFERENCE.md](https://github.com/akul-ameya/aetherdialect/blob/main/API_REFERENCE.md)**).
 - **Seed warmup** — seed questions → gold intents → **deterministic expansion** (many operators, deduplicated) → validate/execute → NL question generation for new templates.
 - **QSim** — reproducible synthetic questions from schema and profiles (seeded randomness).
@@ -250,7 +251,7 @@ After construction, the migration outcome is printed when non-trivial; use **`Te
 - **Accepted templates** — intent fingerprint, parameterized SQL, optional example question, **trust** that rises with validation and falls with rejection.
 - **Rejected templates** (“negative memory”) — failures are stored with **categories** (and optional user **rejection reasons** when collected) so similar bad intents are discouraged on later turns.
 - **Loader reconciliation** — when you open an existing template file, rows that no longer match the current graph (missing tables, columns, or join segments) are pruned, negative memory for removed rejects is cleared, and stale failure-log rows from older hashes are filtered before the store is saved for the current scope. Large fingerprint jumps are handled by the **migration** path above, not only this incremental prune.
-- Persistence lives next to the manifest under your **`artifacts_dir`** tree (**[API_REFERENCE.md](API_REFERENCE.md)**); back it up or reset it as described under **Artifacts and migration**.
+- Persistence lives next to the manifest under your **`artifacts_dir`** tree (**[API_REFERENCE.md](https://github.com/akul-ameya/aetherdialect/blob/main/API_REFERENCE.md)**); back it up or reset it as described under **Artifacts and migration**.
 ---

aetherdialect 0.1.3__tar.gz → 0.1.4__tar.gz

aetherdialect 0.1.3tar.gz → 0.1.4tar.gz