hotdata-ibis 0.1.1__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,176 @@
1
+ Metadata-Version: 2.4
2
+ Name: hotdata-ibis
3
+ Version: 0.1.1
4
+ Summary: Ibis backend for Hotdata federated SQL API (depends on the hotdata SDK only; not hotdata-runtime)
5
+ Author: Hotdata Ibis contributors
6
+ License: Apache-2.0
7
+ Project-URL: Documentation, https://www.hotdata.dev/docs/api-reference
8
+ Project-URL: Ibis, https://ibis-project.org/
9
+ Keywords: ibis,hotdata,sql,dataframe
10
+ Classifier: Development Status :: 3 - Alpha
11
+ Classifier: Intended Audience :: Developers
12
+ Classifier: License :: OSI Approved :: Apache Software License
13
+ Classifier: Programming Language :: Python :: 3
14
+ Classifier: Programming Language :: Python :: 3.10
15
+ Classifier: Programming Language :: Python :: 3.11
16
+ Classifier: Programming Language :: Python :: 3.12
17
+ Classifier: Programming Language :: Python :: 3.13
18
+ Classifier: Programming Language :: Python :: 3.14
19
+ Requires-Python: >=3.10
20
+ Description-Content-Type: text/markdown
21
+ Requires-Dist: ibis-framework<11,>=10.0
22
+ Requires-Dist: hotdata>=0.2.0
23
+ Requires-Dist: pyarrow>=15
24
+ Requires-Dist: pyarrow-hotfix>=0.6
25
+ Requires-Dist: pandas>=2
26
+ Requires-Dist: sqlglot>=24
27
+
28
+ # hotdata-ibis
29
+
30
+ Experimental [Ibis](https://ibis-project.org/) backend for [Hotdata](https://www.hotdata.dev/docs/api-reference): compile expressions with Ibis, run federated SQL over the Hotdata API. REST calls use the official **[hotdata](https://github.com/hotdata-dev/sdk-python)** Python SDK. Repo examples use **httpx** (listed under the **dev** dependency group).
31
+
32
+ **Requirements:** Python 3.10+, **ibis-framework** 10.x, **hotdata** ≥0.2.
33
+
34
+ ## Install
35
+
36
+ ```bash
37
+ uv pip install hotdata-ibis
38
+ # or: python -m pip install hotdata-ibis
39
+ ```
40
+
41
+ ## Features
42
+
43
+ - **Ibis connection API** — connect with `ibis.hotdata.connect(...)` or `ibis.connect("hotdata://...")`.
44
+ - **Hotdata catalog mapping** — expose Hotdata connections, schemas, and tables through Ibis catalogs, databases, and tables.
45
+ - **SQL-backed expression execution** — compile Ibis expressions with the Postgres SQLGlot compiler and execute them through Hotdata query APIs.
46
+ - **Typed table discovery** — load schema metadata from Hotdata information schema and map SQL types into Ibis types.
47
+ - **Arrow and pandas results** — materialize expressions as pandas DataFrames, PyArrow tables, or local Arrow record batches.
48
+ - **Raw SQL escape hatch** — use `con.sql(..., dialect="postgres")` when Hotdata-specific federated SQL is clearer than modeled Ibis expressions.
49
+ - **Managed database writes** — create managed connections with `create_database`, load local pandas or PyArrow data through `create_table`, and clean up with `drop_table` / `drop_database`.
50
+
51
+ ## Connect
52
+
53
+ Programmatic API:
54
+
55
+ ```python
56
+ import ibis
57
+
58
+ con = ibis.hotdata.connect(
59
+ api_url="https://api.hotdata.dev",
60
+ token="YOUR_API_TOKEN",
61
+ workspace_id="ws_…",
62
+ session_id=None, # optional: X-Session-Id (sandbox)
63
+ verify_ssl=True,
64
+ timeout=120.0,
65
+ default_connection=None, # Hotdata connection id → Ibis catalog
66
+ default_schema=None, # remote schema → Ibis database
67
+ poll_interval_s=0.25,
68
+ poll_timeout_s=600.0,
69
+ )
70
+ ```
71
+
72
+ URL style (token may live in the query string or the URL “password” segment):
73
+
74
+ ```python
75
+ con = ibis.connect(
76
+ "hotdata://api.hotdata.dev/?token=…&workspace_id=ws_…&verify_ssl=true"
77
+ )
78
+ ```
79
+
80
+ **Mapping:** Ibis **catalog** = Hotdata connection id; **database** = remote schema; **table** = table name. SQL references look like `connection.schema.table`. With a single connection and schema, defaults are inferred; otherwise set `default_connection` / `default_schema` or qualify `con.table(..., database=(conn_id, schema))`.
81
+
82
+ **Execution:** SQL is compiled with Ibis’s **Postgres** SQLGlot compiler. The client submits queries asynchronously with `POST /v1/query`, polls `GET /v1/query-runs/{id}`, then downloads ready results as Arrow IPC from `GET /v1/results/{id}`. Tuning: `poll_interval_s`, `poll_timeout_s` on `connect()`.
83
+
84
+ **Types:** Typed tables come from Hotdata’s information schema. `con.sql(...)` types are inferred from a small preview query and Arrow schema; see [Hotdata SQL](https://www.hotdata.dev/docs/sql) for server behavior.
85
+
86
+ ## Ibis Support Overview
87
+
88
+ `hotdata-ibis` is a read-oriented SQL backend. It is useful for exploring Hotdata workspaces with Ibis expressions, running federated SQL, and materializing results locally, but it is not a full mutable database backend.
89
+
90
+ Supported today:
91
+
92
+ - **Connection setup:** `ibis.hotdata.connect(...)` and `ibis.connect("hotdata://...")` with token, workspace, optional sandbox session, TLS, timeout, and polling settings.
93
+ - **Catalog discovery:** `list_catalogs`, `list_databases`, `list_tables`, `current_catalog`, and `current_database` map Hotdata connections and remote schemas into Ibis' catalog/database/table hierarchy.
94
+ - **Table schemas:** `con.table(...)` uses Hotdata information schema column metadata and maps SQL types through Ibis' Postgres type parser.
95
+ - **SQL-backed expressions:** Ibis expressions compile with the Postgres SQLGlot compiler and execute through Hotdata. Common `SELECT` workloads such as projection, filtering, joins, grouping, aggregation, ordering, limits, scalar expressions, and `con.sql(...)` work when the generated SQL is accepted by Hotdata.
96
+ - **Result materialization:** `.execute()` returns pandas objects. `.to_pyarrow()` and `.to_pyarrow_batches()` use the Arrow IPC result data exposed by Hotdata without converting through JSON rows; batches are split locally after the result is downloaded.
97
+ - **Raw SQL escape hatch:** `con.sql("SELECT ...", dialect="postgres")` is the most reliable way to use Hotdata-specific federated table names or SQL that Ibis does not model directly.
98
+ - **Managed database lifecycle:** `create_database("sales", schema="public", tables=["orders"])` registers a managed connection (Ibis catalog). `create_table("orders", pandas_df, database=("sales", "public"))` uploads Parquet and loads it with replace mode. Query as `sales.public.orders` in SQL. `drop_table` clears a managed table; `drop_database` deletes the connection.
99
+ - **Parquet uploads:** `create_table` accepts pandas DataFrames, PyArrow tables, or schema-only empty tables. Tables must live in a managed connection — declare them with `create_database(..., tables=[...])` first. Loads always use replace mode; pass `overwrite=True` to replace an existing synced table (the default `overwrite=False` raises if the table already exists).
100
+
101
+ Not supported as full Ibis backend features:
102
+
103
+ - **General DDL and mutations:** Arbitrary remote DDL, inserts, updates, deletes, and schema-altering operations on external connections are not implemented. Managed-database writes are limited to `create_database`, `create_table`, `drop_table`, and `drop_database` as described above.
104
+ - **Temporary tables and in-memory registration:** `supports_temporary_tables` is false, and in-memory tables are not uploaded automatically for joins.
105
+ - **Python UDFs:** `supports_python_udfs` is false.
106
+ - **Transactions and sessions as database state:** Hotdata sandbox sessions can be passed as `session_id`, but the backend does not expose transaction APIs.
107
+ - **Backend-native SQL dialect:** Compilation uses Ibis' Postgres dialect as the closest fit. Hotdata SQL and federation rules are authoritative, so not every Ibis expression that compiles is guaranteed to execute remotely.
108
+ - **Complete Ibis compliance:** The backend is experimental and has focused test coverage for connection, discovery, schema mapping, execution, uploads, and Arrow results. It has not yet been validated against the full Ibis backend test suite.
109
+ - **Hotdata platform APIs beyond SQL and managed databases:** embeddings, indexes, query history management, sandbox lifecycle management, and other Hotdata-specific APIs are outside the Ibis backend surface.
110
+
111
+ ## Development
112
+
113
+ ```bash
114
+ uv sync # installs dev group by default (pytest, ruff, httpx for examples)
115
+ uv run pytest
116
+ uv run ruff check src tests examples
117
+ ```
118
+
119
+ Lockfile CI: `uv sync --locked && uv run pytest`.
120
+
121
+ ## TPC-H for the examples
122
+
123
+ Examples assume something like **`tpch.tpch_sf1.customer`**. Provision TPC-H in your workspace (commonly a **DuckDB** connection, then DuckDB’s `tpch` extension and `CALL dbgen(sf = 1)` — see [DuckDB TPC-H](https://www.duckdb.org/docs/current/core_extensions/tpch.html) and [Hotdata Quick Start](https://www.hotdata.dev/docs/quick-start)). If your data lives under `main` instead, pass `--default-schema` / `--default-connection` or set `HOTDATA_DEFAULT_*` (see `examples/_helpers.py`).
124
+
125
+ ## Examples
126
+
127
+ Needs `HOTDATA_API_KEY` and `HOTDATA_WORKSPACE`.
128
+
129
+ ```bash
130
+ uv sync
131
+ export HOTDATA_API_KEY=…
132
+ export HOTDATA_WORKSPACE=…
133
+ uv run python examples/01_catalog_introspection.py
134
+ uv run python examples/02_execute_sql.py 'SELECT COUNT(*) AS n FROM tpch.tpch_sf1.customer'
135
+ uv run python examples/03_connect_via_url.py
136
+ uv run python examples/04_ibis_table_workflows.py
137
+ ```
138
+
139
+ ### Ibis tables → pandas DataFrames
140
+
141
+ Calling **`.execute()`** on a table expression runs the compiled SQL on Hotdata and returns a **pandas** `DataFrame` (Ibis’s default for this backend).
142
+
143
+ Hotdata’s SQL often uses a **federated prefix** (for example `tpch.tpch_sf1`) that may not match the Ibis **catalog** string (the connection id). A reliable pattern is to start from **`con.sql("SELECT * FROM tpch.tpch_sf1.mytable", dialect="postgres")`**, then chain filters and aggregates—see **`examples/04_ibis_table_workflows.py`**.
144
+
145
+ When **`con.table("mytable")`** is enough (single connection/schema and names align with compiled SQL), the same operations apply:
146
+
147
+ ```python
148
+ t = con.table("customer") # or con.table("customer", database=(conn_id, "tpch_sf1"))
149
+
150
+ df = (
151
+ t.filter(t.c_mktsegment == "AUTOMOBILE")
152
+ .select("c_custkey", "c_name")
153
+ .limit(100)
154
+ .execute()
155
+ )
156
+
157
+ by_seg = t.group_by(t.c_mktsegment).agg(n=t.count()).execute()
158
+
159
+ o = con.table("orders")
160
+ orders_with_names = (
161
+ t.join(o, t.c_custkey == o.o_custkey)
162
+ .select(t.c_name, o.o_totalprice)
163
+ .limit(50)
164
+ .execute()
165
+ )
166
+
167
+ total = t.c_acctbal.sum().execute()
168
+ ```
169
+
170
+ Other useful paths: **`.to_pyarrow()`** / **`.to_pyarrow_batches()`** for Arrow; **`con.sql("SELECT …", dialect="postgres")`** then chain the returned table expression.
171
+
172
+ ## References
173
+
174
+ - [Hotdata Python SDK](https://github.com/hotdata-dev/sdk-python)
175
+ - [Hotdata API](https://www.hotdata.dev/docs/api-reference) · [Hotdata SQL](https://www.hotdata.dev/docs/sql)
176
+ - [Ibis](https://ibis-project.org/) · [Ibis backend hierarchy](https://ibis-project.org/concepts/backend-table-hierarchy.qmd)
@@ -0,0 +1,149 @@
1
+ # hotdata-ibis
2
+
3
+ Experimental [Ibis](https://ibis-project.org/) backend for [Hotdata](https://www.hotdata.dev/docs/api-reference): compile expressions with Ibis, run federated SQL over the Hotdata API. REST calls use the official **[hotdata](https://github.com/hotdata-dev/sdk-python)** Python SDK. Repo examples use **httpx** (listed under the **dev** dependency group).
4
+
5
+ **Requirements:** Python 3.10+, **ibis-framework** 10.x, **hotdata** ≥0.2.
6
+
7
+ ## Install
8
+
9
+ ```bash
10
+ uv pip install hotdata-ibis
11
+ # or: python -m pip install hotdata-ibis
12
+ ```
13
+
14
+ ## Features
15
+
16
+ - **Ibis connection API** — connect with `ibis.hotdata.connect(...)` or `ibis.connect("hotdata://...")`.
17
+ - **Hotdata catalog mapping** — expose Hotdata connections, schemas, and tables through Ibis catalogs, databases, and tables.
18
+ - **SQL-backed expression execution** — compile Ibis expressions with the Postgres SQLGlot compiler and execute them through Hotdata query APIs.
19
+ - **Typed table discovery** — load schema metadata from Hotdata information schema and map SQL types into Ibis types.
20
+ - **Arrow and pandas results** — materialize expressions as pandas DataFrames, PyArrow tables, or local Arrow record batches.
21
+ - **Raw SQL escape hatch** — use `con.sql(..., dialect="postgres")` when Hotdata-specific federated SQL is clearer than modeled Ibis expressions.
22
+ - **Managed database writes** — create managed connections with `create_database`, load local pandas or PyArrow data through `create_table`, and clean up with `drop_table` / `drop_database`.
23
+
24
+ ## Connect
25
+
26
+ Programmatic API:
27
+
28
+ ```python
29
+ import ibis
30
+
31
+ con = ibis.hotdata.connect(
32
+ api_url="https://api.hotdata.dev",
33
+ token="YOUR_API_TOKEN",
34
+ workspace_id="ws_…",
35
+ session_id=None, # optional: X-Session-Id (sandbox)
36
+ verify_ssl=True,
37
+ timeout=120.0,
38
+ default_connection=None, # Hotdata connection id → Ibis catalog
39
+ default_schema=None, # remote schema → Ibis database
40
+ poll_interval_s=0.25,
41
+ poll_timeout_s=600.0,
42
+ )
43
+ ```
44
+
45
+ URL style (token may live in the query string or the URL “password” segment):
46
+
47
+ ```python
48
+ con = ibis.connect(
49
+ "hotdata://api.hotdata.dev/?token=…&workspace_id=ws_…&verify_ssl=true"
50
+ )
51
+ ```
52
+
53
+ **Mapping:** Ibis **catalog** = Hotdata connection id; **database** = remote schema; **table** = table name. SQL references look like `connection.schema.table`. With a single connection and schema, defaults are inferred; otherwise set `default_connection` / `default_schema` or qualify `con.table(..., database=(conn_id, schema))`.
54
+
55
+ **Execution:** SQL is compiled with Ibis’s **Postgres** SQLGlot compiler. The client submits queries asynchronously with `POST /v1/query`, polls `GET /v1/query-runs/{id}`, then downloads ready results as Arrow IPC from `GET /v1/results/{id}`. Tuning: `poll_interval_s`, `poll_timeout_s` on `connect()`.
56
+
57
+ **Types:** Typed tables come from Hotdata’s information schema. `con.sql(...)` types are inferred from a small preview query and Arrow schema; see [Hotdata SQL](https://www.hotdata.dev/docs/sql) for server behavior.
58
+
59
+ ## Ibis Support Overview
60
+
61
+ `hotdata-ibis` is a read-oriented SQL backend. It is useful for exploring Hotdata workspaces with Ibis expressions, running federated SQL, and materializing results locally, but it is not a full mutable database backend.
62
+
63
+ Supported today:
64
+
65
+ - **Connection setup:** `ibis.hotdata.connect(...)` and `ibis.connect("hotdata://...")` with token, workspace, optional sandbox session, TLS, timeout, and polling settings.
66
+ - **Catalog discovery:** `list_catalogs`, `list_databases`, `list_tables`, `current_catalog`, and `current_database` map Hotdata connections and remote schemas into Ibis' catalog/database/table hierarchy.
67
+ - **Table schemas:** `con.table(...)` uses Hotdata information schema column metadata and maps SQL types through Ibis' Postgres type parser.
68
+ - **SQL-backed expressions:** Ibis expressions compile with the Postgres SQLGlot compiler and execute through Hotdata. Common `SELECT` workloads such as projection, filtering, joins, grouping, aggregation, ordering, limits, scalar expressions, and `con.sql(...)` work when the generated SQL is accepted by Hotdata.
69
+ - **Result materialization:** `.execute()` returns pandas objects. `.to_pyarrow()` and `.to_pyarrow_batches()` use the Arrow IPC result data exposed by Hotdata without converting through JSON rows; batches are split locally after the result is downloaded.
70
+ - **Raw SQL escape hatch:** `con.sql("SELECT ...", dialect="postgres")` is the most reliable way to use Hotdata-specific federated table names or SQL that Ibis does not model directly.
71
+ - **Managed database lifecycle:** `create_database("sales", schema="public", tables=["orders"])` registers a managed connection (Ibis catalog). `create_table("orders", pandas_df, database=("sales", "public"))` uploads Parquet and loads it with replace mode. Query as `sales.public.orders` in SQL. `drop_table` clears a managed table; `drop_database` deletes the connection.
72
+ - **Parquet uploads:** `create_table` accepts pandas DataFrames, PyArrow tables, or schema-only empty tables. Tables must live in a managed connection — declare them with `create_database(..., tables=[...])` first. Loads always use replace mode; pass `overwrite=True` to replace an existing synced table (the default `overwrite=False` raises if the table already exists).
73
+
74
+ Not supported as full Ibis backend features:
75
+
76
+ - **General DDL and mutations:** Arbitrary remote DDL, inserts, updates, deletes, and schema-altering operations on external connections are not implemented. Managed-database writes are limited to `create_database`, `create_table`, `drop_table`, and `drop_database` as described above.
77
+ - **Temporary tables and in-memory registration:** `supports_temporary_tables` is false, and in-memory tables are not uploaded automatically for joins.
78
+ - **Python UDFs:** `supports_python_udfs` is false.
79
+ - **Transactions and sessions as database state:** Hotdata sandbox sessions can be passed as `session_id`, but the backend does not expose transaction APIs.
80
+ - **Backend-native SQL dialect:** Compilation uses Ibis' Postgres dialect as the closest fit. Hotdata SQL and federation rules are authoritative, so not every Ibis expression that compiles is guaranteed to execute remotely.
81
+ - **Complete Ibis compliance:** The backend is experimental and has focused test coverage for connection, discovery, schema mapping, execution, uploads, and Arrow results. It has not yet been validated against the full Ibis backend test suite.
82
+ - **Hotdata platform APIs beyond SQL and managed databases:** embeddings, indexes, query history management, sandbox lifecycle management, and other Hotdata-specific APIs are outside the Ibis backend surface.
83
+
84
+ ## Development
85
+
86
+ ```bash
87
+ uv sync # installs dev group by default (pytest, ruff, httpx for examples)
88
+ uv run pytest
89
+ uv run ruff check src tests examples
90
+ ```
91
+
92
+ Lockfile CI: `uv sync --locked && uv run pytest`.
93
+
94
+ ## TPC-H for the examples
95
+
96
+ Examples assume something like **`tpch.tpch_sf1.customer`**. Provision TPC-H in your workspace (commonly a **DuckDB** connection, then DuckDB’s `tpch` extension and `CALL dbgen(sf = 1)` — see [DuckDB TPC-H](https://www.duckdb.org/docs/current/core_extensions/tpch.html) and [Hotdata Quick Start](https://www.hotdata.dev/docs/quick-start)). If your data lives under `main` instead, pass `--default-schema` / `--default-connection` or set `HOTDATA_DEFAULT_*` (see `examples/_helpers.py`).
97
+
98
+ ## Examples
99
+
100
+ Needs `HOTDATA_API_KEY` and `HOTDATA_WORKSPACE`.
101
+
102
+ ```bash
103
+ uv sync
104
+ export HOTDATA_API_KEY=…
105
+ export HOTDATA_WORKSPACE=…
106
+ uv run python examples/01_catalog_introspection.py
107
+ uv run python examples/02_execute_sql.py 'SELECT COUNT(*) AS n FROM tpch.tpch_sf1.customer'
108
+ uv run python examples/03_connect_via_url.py
109
+ uv run python examples/04_ibis_table_workflows.py
110
+ ```
111
+
112
+ ### Ibis tables → pandas DataFrames
113
+
114
+ Calling **`.execute()`** on a table expression runs the compiled SQL on Hotdata and returns a **pandas** `DataFrame` (Ibis’s default for this backend).
115
+
116
+ Hotdata’s SQL often uses a **federated prefix** (for example `tpch.tpch_sf1`) that may not match the Ibis **catalog** string (the connection id). A reliable pattern is to start from **`con.sql("SELECT * FROM tpch.tpch_sf1.mytable", dialect="postgres")`**, then chain filters and aggregates—see **`examples/04_ibis_table_workflows.py`**.
117
+
118
+ When **`con.table("mytable")`** is enough (single connection/schema and names align with compiled SQL), the same operations apply:
119
+
120
+ ```python
121
+ t = con.table("customer") # or con.table("customer", database=(conn_id, "tpch_sf1"))
122
+
123
+ df = (
124
+ t.filter(t.c_mktsegment == "AUTOMOBILE")
125
+ .select("c_custkey", "c_name")
126
+ .limit(100)
127
+ .execute()
128
+ )
129
+
130
+ by_seg = t.group_by(t.c_mktsegment).agg(n=t.count()).execute()
131
+
132
+ o = con.table("orders")
133
+ orders_with_names = (
134
+ t.join(o, t.c_custkey == o.o_custkey)
135
+ .select(t.c_name, o.o_totalprice)
136
+ .limit(50)
137
+ .execute()
138
+ )
139
+
140
+ total = t.c_acctbal.sum().execute()
141
+ ```
142
+
143
+ Other useful paths: **`.to_pyarrow()`** / **`.to_pyarrow_batches()`** for Arrow; **`con.sql("SELECT …", dialect="postgres")`** then chain the returned table expression.
144
+
145
+ ## References
146
+
147
+ - [Hotdata Python SDK](https://github.com/hotdata-dev/sdk-python)
148
+ - [Hotdata API](https://www.hotdata.dev/docs/api-reference) · [Hotdata SQL](https://www.hotdata.dev/docs/sql)
149
+ - [Ibis](https://ibis-project.org/) · [Ibis backend hierarchy](https://ibis-project.org/concepts/backend-table-hierarchy.qmd)
@@ -0,0 +1,64 @@
1
+ [build-system]
2
+ requires = ["setuptools>=61", "wheel"]
3
+ build-backend = "setuptools.build_meta"
4
+
5
+ [project]
6
+ name = "hotdata-ibis"
7
+ version = "0.1.1"
8
+ description = "Ibis backend for Hotdata federated SQL API (depends on the hotdata SDK only; not hotdata-runtime)"
9
+ readme = "README.md"
10
+ requires-python = ">=3.10"
11
+ license = { text = "Apache-2.0" }
12
+ authors = [{ name = "Hotdata Ibis contributors" }]
13
+ keywords = ["ibis", "hotdata", "sql", "dataframe"]
14
+ classifiers = [
15
+ "Development Status :: 3 - Alpha",
16
+ "Intended Audience :: Developers",
17
+ "License :: OSI Approved :: Apache Software License",
18
+ "Programming Language :: Python :: 3",
19
+ "Programming Language :: Python :: 3.10",
20
+ "Programming Language :: Python :: 3.11",
21
+ "Programming Language :: Python :: 3.12",
22
+ "Programming Language :: Python :: 3.13",
23
+ "Programming Language :: Python :: 3.14",
24
+ ]
25
+ dependencies = [
26
+ "ibis-framework>=10.0,<11",
27
+ "hotdata>=0.2.0",
28
+ "pyarrow>=15",
29
+ "pyarrow-hotfix>=0.6",
30
+ "pandas>=2",
31
+ "sqlglot>=24",
32
+ ]
33
+
34
+ [dependency-groups]
35
+ dev = [
36
+ "httpx>=0.27",
37
+ "pytest>=8",
38
+ "pytest-httpserver>=1",
39
+ "ruff>=0.5",
40
+ ]
41
+
42
+ [tool.uv]
43
+ default-groups = ["dev"]
44
+
45
+ [project.urls]
46
+ Documentation = "https://www.hotdata.dev/docs/api-reference"
47
+ "Ibis" = "https://ibis-project.org/"
48
+
49
+ [project.entry-points."ibis.backends"]
50
+ hotdata = "ibis_hotdata"
51
+
52
+ [tool.setuptools.packages.find]
53
+ where = ["src"]
54
+
55
+ [tool.pytest.ini_options]
56
+ testpaths = ["tests"]
57
+
58
+ [tool.ruff]
59
+ line-length = 100
60
+ target-version = "py310"
61
+
62
+ [tool.ruff.lint.per-file-ignores]
63
+ "tests/**/*.py" = ["E402"]
64
+ "examples/**/*.py" = ["E402"]
@@ -0,0 +1,4 @@
1
+ [egg_info]
2
+ tag_build =
3
+ tag_date = 0
4
+
@@ -0,0 +1,176 @@
1
+ Metadata-Version: 2.4
2
+ Name: hotdata-ibis
3
+ Version: 0.1.1
4
+ Summary: Ibis backend for Hotdata federated SQL API (depends on the hotdata SDK only; not hotdata-runtime)
5
+ Author: Hotdata Ibis contributors
6
+ License: Apache-2.0
7
+ Project-URL: Documentation, https://www.hotdata.dev/docs/api-reference
8
+ Project-URL: Ibis, https://ibis-project.org/
9
+ Keywords: ibis,hotdata,sql,dataframe
10
+ Classifier: Development Status :: 3 - Alpha
11
+ Classifier: Intended Audience :: Developers
12
+ Classifier: License :: OSI Approved :: Apache Software License
13
+ Classifier: Programming Language :: Python :: 3
14
+ Classifier: Programming Language :: Python :: 3.10
15
+ Classifier: Programming Language :: Python :: 3.11
16
+ Classifier: Programming Language :: Python :: 3.12
17
+ Classifier: Programming Language :: Python :: 3.13
18
+ Classifier: Programming Language :: Python :: 3.14
19
+ Requires-Python: >=3.10
20
+ Description-Content-Type: text/markdown
21
+ Requires-Dist: ibis-framework<11,>=10.0
22
+ Requires-Dist: hotdata>=0.2.0
23
+ Requires-Dist: pyarrow>=15
24
+ Requires-Dist: pyarrow-hotfix>=0.6
25
+ Requires-Dist: pandas>=2
26
+ Requires-Dist: sqlglot>=24
27
+
28
+ # hotdata-ibis
29
+
30
+ Experimental [Ibis](https://ibis-project.org/) backend for [Hotdata](https://www.hotdata.dev/docs/api-reference): compile expressions with Ibis, run federated SQL over the Hotdata API. REST calls use the official **[hotdata](https://github.com/hotdata-dev/sdk-python)** Python SDK. Repo examples use **httpx** (listed under the **dev** dependency group).
31
+
32
+ **Requirements:** Python 3.10+, **ibis-framework** 10.x, **hotdata** ≥0.2.
33
+
34
+ ## Install
35
+
36
+ ```bash
37
+ uv pip install hotdata-ibis
38
+ # or: python -m pip install hotdata-ibis
39
+ ```
40
+
41
+ ## Features
42
+
43
+ - **Ibis connection API** — connect with `ibis.hotdata.connect(...)` or `ibis.connect("hotdata://...")`.
44
+ - **Hotdata catalog mapping** — expose Hotdata connections, schemas, and tables through Ibis catalogs, databases, and tables.
45
+ - **SQL-backed expression execution** — compile Ibis expressions with the Postgres SQLGlot compiler and execute them through Hotdata query APIs.
46
+ - **Typed table discovery** — load schema metadata from Hotdata information schema and map SQL types into Ibis types.
47
+ - **Arrow and pandas results** — materialize expressions as pandas DataFrames, PyArrow tables, or local Arrow record batches.
48
+ - **Raw SQL escape hatch** — use `con.sql(..., dialect="postgres")` when Hotdata-specific federated SQL is clearer than modeled Ibis expressions.
49
+ - **Managed database writes** — create managed connections with `create_database`, load local pandas or PyArrow data through `create_table`, and clean up with `drop_table` / `drop_database`.
50
+
51
+ ## Connect
52
+
53
+ Programmatic API:
54
+
55
+ ```python
56
+ import ibis
57
+
58
+ con = ibis.hotdata.connect(
59
+ api_url="https://api.hotdata.dev",
60
+ token="YOUR_API_TOKEN",
61
+ workspace_id="ws_…",
62
+ session_id=None, # optional: X-Session-Id (sandbox)
63
+ verify_ssl=True,
64
+ timeout=120.0,
65
+ default_connection=None, # Hotdata connection id → Ibis catalog
66
+ default_schema=None, # remote schema → Ibis database
67
+ poll_interval_s=0.25,
68
+ poll_timeout_s=600.0,
69
+ )
70
+ ```
71
+
72
+ URL style (token may live in the query string or the URL “password” segment):
73
+
74
+ ```python
75
+ con = ibis.connect(
76
+ "hotdata://api.hotdata.dev/?token=…&workspace_id=ws_…&verify_ssl=true"
77
+ )
78
+ ```
79
+
80
+ **Mapping:** Ibis **catalog** = Hotdata connection id; **database** = remote schema; **table** = table name. SQL references look like `connection.schema.table`. With a single connection and schema, defaults are inferred; otherwise set `default_connection` / `default_schema` or qualify `con.table(..., database=(conn_id, schema))`.
81
+
82
+ **Execution:** SQL is compiled with Ibis’s **Postgres** SQLGlot compiler. The client submits queries asynchronously with `POST /v1/query`, polls `GET /v1/query-runs/{id}`, then downloads ready results as Arrow IPC from `GET /v1/results/{id}`. Tuning: `poll_interval_s`, `poll_timeout_s` on `connect()`.
83
+
84
+ **Types:** Typed tables come from Hotdata’s information schema. `con.sql(...)` types are inferred from a small preview query and Arrow schema; see [Hotdata SQL](https://www.hotdata.dev/docs/sql) for server behavior.
85
+
86
+ ## Ibis Support Overview
87
+
88
+ `hotdata-ibis` is a read-oriented SQL backend. It is useful for exploring Hotdata workspaces with Ibis expressions, running federated SQL, and materializing results locally, but it is not a full mutable database backend.
89
+
90
+ Supported today:
91
+
92
+ - **Connection setup:** `ibis.hotdata.connect(...)` and `ibis.connect("hotdata://...")` with token, workspace, optional sandbox session, TLS, timeout, and polling settings.
93
+ - **Catalog discovery:** `list_catalogs`, `list_databases`, `list_tables`, `current_catalog`, and `current_database` map Hotdata connections and remote schemas into Ibis' catalog/database/table hierarchy.
94
+ - **Table schemas:** `con.table(...)` uses Hotdata information schema column metadata and maps SQL types through Ibis' Postgres type parser.
95
+ - **SQL-backed expressions:** Ibis expressions compile with the Postgres SQLGlot compiler and execute through Hotdata. Common `SELECT` workloads such as projection, filtering, joins, grouping, aggregation, ordering, limits, scalar expressions, and `con.sql(...)` work when the generated SQL is accepted by Hotdata.
96
+ - **Result materialization:** `.execute()` returns pandas objects. `.to_pyarrow()` and `.to_pyarrow_batches()` use the Arrow IPC result data exposed by Hotdata without converting through JSON rows; batches are split locally after the result is downloaded.
97
+ - **Raw SQL escape hatch:** `con.sql("SELECT ...", dialect="postgres")` is the most reliable way to use Hotdata-specific federated table names or SQL that Ibis does not model directly.
98
+ - **Managed database lifecycle:** `create_database("sales", schema="public", tables=["orders"])` registers a managed connection (Ibis catalog). `create_table("orders", pandas_df, database=("sales", "public"))` uploads Parquet and loads it with replace mode. Query as `sales.public.orders` in SQL. `drop_table` clears a managed table; `drop_database` deletes the connection.
99
+ - **Parquet uploads:** `create_table` accepts pandas DataFrames, PyArrow tables, or schema-only empty tables. Tables must live in a managed connection — declare them with `create_database(..., tables=[...])` first. Loads always use replace mode; pass `overwrite=True` to replace an existing synced table (the default `overwrite=False` raises if the table already exists).
100
+
101
+ Not supported as full Ibis backend features:
102
+
103
+ - **General DDL and mutations:** Arbitrary remote DDL, inserts, updates, deletes, and schema-altering operations on external connections are not implemented. Managed-database writes are limited to `create_database`, `create_table`, `drop_table`, and `drop_database` as described above.
104
+ - **Temporary tables and in-memory registration:** `supports_temporary_tables` is false, and in-memory tables are not uploaded automatically for joins.
105
+ - **Python UDFs:** `supports_python_udfs` is false.
106
+ - **Transactions and sessions as database state:** Hotdata sandbox sessions can be passed as `session_id`, but the backend does not expose transaction APIs.
107
+ - **Backend-native SQL dialect:** Compilation uses Ibis' Postgres dialect as the closest fit. Hotdata SQL and federation rules are authoritative, so not every Ibis expression that compiles is guaranteed to execute remotely.
108
+ - **Complete Ibis compliance:** The backend is experimental and has focused test coverage for connection, discovery, schema mapping, execution, uploads, and Arrow results. It has not yet been validated against the full Ibis backend test suite.
109
+ - **Hotdata platform APIs beyond SQL and managed databases:** embeddings, indexes, query history management, sandbox lifecycle management, and other Hotdata-specific APIs are outside the Ibis backend surface.
110
+
111
+ ## Development
112
+
113
+ ```bash
114
+ uv sync # installs dev group by default (pytest, ruff, httpx for examples)
115
+ uv run pytest
116
+ uv run ruff check src tests examples
117
+ ```
118
+
119
+ Lockfile CI: `uv sync --locked && uv run pytest`.
120
+
121
+ ## TPC-H for the examples
122
+
123
+ Examples assume something like **`tpch.tpch_sf1.customer`**. Provision TPC-H in your workspace (commonly a **DuckDB** connection, then DuckDB’s `tpch` extension and `CALL dbgen(sf = 1)` — see [DuckDB TPC-H](https://www.duckdb.org/docs/current/core_extensions/tpch.html) and [Hotdata Quick Start](https://www.hotdata.dev/docs/quick-start)). If your data lives under `main` instead, pass `--default-schema` / `--default-connection` or set `HOTDATA_DEFAULT_*` (see `examples/_helpers.py`).
124
+
125
+ ## Examples
126
+
127
+ Needs `HOTDATA_API_KEY` and `HOTDATA_WORKSPACE`.
128
+
129
+ ```bash
130
+ uv sync
131
+ export HOTDATA_API_KEY=…
132
+ export HOTDATA_WORKSPACE=…
133
+ uv run python examples/01_catalog_introspection.py
134
+ uv run python examples/02_execute_sql.py 'SELECT COUNT(*) AS n FROM tpch.tpch_sf1.customer'
135
+ uv run python examples/03_connect_via_url.py
136
+ uv run python examples/04_ibis_table_workflows.py
137
+ ```
138
+
139
+ ### Ibis tables → pandas DataFrames
140
+
141
+ Calling **`.execute()`** on a table expression runs the compiled SQL on Hotdata and returns a **pandas** `DataFrame` (Ibis’s default for this backend).
142
+
143
+ Hotdata’s SQL often uses a **federated prefix** (for example `tpch.tpch_sf1`) that may not match the Ibis **catalog** string (the connection id). A reliable pattern is to start from **`con.sql("SELECT * FROM tpch.tpch_sf1.mytable", dialect="postgres")`**, then chain filters and aggregates—see **`examples/04_ibis_table_workflows.py`**.
144
+
145
+ When **`con.table("mytable")`** is enough (single connection/schema and names align with compiled SQL), the same operations apply:
146
+
147
+ ```python
148
+ t = con.table("customer") # or con.table("customer", database=(conn_id, "tpch_sf1"))
149
+
150
+ df = (
151
+ t.filter(t.c_mktsegment == "AUTOMOBILE")
152
+ .select("c_custkey", "c_name")
153
+ .limit(100)
154
+ .execute()
155
+ )
156
+
157
+ by_seg = t.group_by(t.c_mktsegment).agg(n=t.count()).execute()
158
+
159
+ o = con.table("orders")
160
+ orders_with_names = (
161
+ t.join(o, t.c_custkey == o.o_custkey)
162
+ .select(t.c_name, o.o_totalprice)
163
+ .limit(50)
164
+ .execute()
165
+ )
166
+
167
+ total = t.c_acctbal.sum().execute()
168
+ ```
169
+
170
+ Other useful paths: **`.to_pyarrow()`** / **`.to_pyarrow_batches()`** for Arrow; **`con.sql("SELECT …", dialect="postgres")`** then chain the returned table expression.
171
+
172
+ ## References
173
+
174
+ - [Hotdata Python SDK](https://github.com/hotdata-dev/sdk-python)
175
+ - [Hotdata API](https://www.hotdata.dev/docs/api-reference) · [Hotdata SQL](https://www.hotdata.dev/docs/sql)
176
+ - [Ibis](https://ibis-project.org/) · [Ibis backend hierarchy](https://ibis-project.org/concepts/backend-table-hierarchy.qmd)
@@ -0,0 +1,18 @@
1
+ README.md
2
+ pyproject.toml
3
+ src/hotdata_ibis.egg-info/PKG-INFO
4
+ src/hotdata_ibis.egg-info/SOURCES.txt
5
+ src/hotdata_ibis.egg-info/dependency_links.txt
6
+ src/hotdata_ibis.egg-info/entry_points.txt
7
+ src/hotdata_ibis.egg-info/requires.txt
8
+ src/hotdata_ibis.egg-info/top_level.txt
9
+ src/ibis_hotdata/__init__.py
10
+ src/ibis_hotdata/backend.py
11
+ src/ibis_hotdata/http.py
12
+ src/ibis_hotdata/managed.py
13
+ src/ibis_hotdata/types.py
14
+ tests/test_architecture_guardrails.py
15
+ tests/test_hotdata_backend.py
16
+ tests/test_hotdata_http.py
17
+ tests/test_hotdata_types.py
18
+ tests/test_version.py
@@ -0,0 +1,2 @@
1
+ [ibis.backends]
2
+ hotdata = ibis_hotdata
@@ -0,0 +1,6 @@
1
+ ibis-framework<11,>=10.0
2
+ hotdata>=0.2.0
3
+ pyarrow>=15
4
+ pyarrow-hotfix>=0.6
5
+ pandas>=2
6
+ sqlglot>=24
@@ -0,0 +1 @@
1
+ ibis_hotdata
@@ -0,0 +1,10 @@
1
+ from importlib.metadata import PackageNotFoundError, version
2
+
3
+ try:
4
+ __version__ = version("hotdata-ibis")
5
+ except PackageNotFoundError:
6
+ __version__ = "0.0.0+unknown"
7
+
8
+ from ibis_hotdata.backend import Backend
9
+
10
+ __all__ = ["Backend", "__version__"]