PyPI - duckrun - Versions diffs - 0.3.17.dev7__tar.gz → 0.3.19__tar.gz - Mend

duckrun 0.3.17.dev7tar.gz → 0.3.19tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (35) hide show

{duckrun-0.3.17.dev7/duckrun.egg-info → duckrun-0.3.19}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: duckrun
-Version: 0.3.17.dev7
+Version: 0.3.19
 Summary: A dbt adapter that runs SQL in DuckDB and materializes to Delta Lake (delta_rs).
 Author: mim
 License: MIT
@@ -12,7 +12,7 @@ Description-Content-Type: text/markdown
 License-File: LICENSE
 Requires-Dist: dbt-duckdb>=1.8
 Requires-Dist: dbt-core<2.0,>=1.8
-Requires-Dist: duckdb==1.5.4
+Requires-Dist: duckdb<1.6.0,>=1.5.4
 Requires-Dist: deltalake<1.5.1,>=1.5.0
 Requires-Dist: requests
 Provides-Extra: local
@@ -35,9 +35,13 @@ Dynamic: license-file
 > not affiliated with, endorsed by, or supported by any employer or vendor. No warranty —
 > use it at your own risk.
-**duckrun** is a [dbt](https://www.getdbt.com/) adapter that runs your model SQL in
-**DuckDB** and writes the results to **Delta Lake** using
-[`delta_rs`](https://delta-io.github.io/delta-rs/) (the `deltalake` Python package).
+**duckrun** runs SQL in [DuckDB](https://duckdb.org/) and writes
+[**Delta Lake**](https://delta-io.github.io/delta-rs/) via delta_rs. It gives you:
+- a [**dbt**](https://www.getdbt.com/) adapter that materializes models as Delta tables;
+- a **`connect()`** helper to write Delta straight from SQL in a notebook;
+- **full snapshot isolation** from read to write — concurrent writers fail loud, never interleave.
 duckrun itself is just glue — it owns none of the heavy lifting. The real work is done
 by **DuckDB** (executes the SQL), **delta-rs** (writes the Delta table), **Arrow** (the
 zero-copy (kind of) bridge that hands query results from DuckDB to delta-rs), and **dbt** (orchestrates
@@ -67,6 +71,21 @@ pip install duckrun
 That single install pulls in `dbt-duckdb` (and therefore `duckdb`) plus `deltalake`.
+### In a Microsoft Fabric Python notebook
+duckrun needs `duckdb` ≥ 1.5.4 — the release where `delta_scan` gained its `version => N`
+parameter, which duckrun uses for snapshot-pinned reads. Fabric notebooks ship a **stable**
+`duckdb` release, which trails the newest one, so the `duckdb` already loaded in the kernel may
+predate 1.5.4. Upgrade, then restart the Python kernel so the new version loads.
+```python
+!pip install duckrun --upgrade
+notebookutils.session.restartPython()
+```
+If you skip the restart, duckrun fails loud at `connect()` (and on `dbt run`) and tells you to
+restart — it won't quietly run on the older `duckdb`/`deltalake` still bound in the kernel.
 ## Configure your profile
 ```yaml
@@ -79,12 +98,22 @@ my_project:
       # No `threads:` needed — duckrun always runs single-threaded.
       # DuckDB runs in-memory by default — the Delta tables are the only state.
       # Default Delta location for models that don't set config(location=...).
-      root_path: './warehouse'   # local path, or s3://..., gs://..., abfss://...
+      # OneLake — address by GUID, not friendly names (see "OneLake: use GUID paths" below):
+      root_path: "abfss://<workspace_id>@onelake.dfs.fabric.microsoft.com/<lakehouse_id>/Tables"
+      # Or any other store: './warehouse' (local), 's3://...', 'gs://...'.
       # storage_options: {}      # passed through to deltalake for remote stores
 ```
 Persisted models are written to `<root_path>/<schema>/<model>` (e.g.
-`./warehouse/dbo/orders`), or to an explicit `config(location=...)`.
+`.../Tables/dbo/orders`), or to an explicit `config(location=...)`.
+### OneLake: use GUID paths for now
+Address OneLake tables by **workspace GUID + lakehouse GUID**, not friendly names —
+`abfss://<workspace_id>@onelake.dfs.fabric.microsoft.com/<lakehouse_id>/Tables/...`. This
+sidesteps an upstream `duckdb-delta` read bug ("No files in log segment") that is **already fixed
+upstream but still rolling out to production OneLake**. Friendly-name paths will work again once
+the fix finishes deploying.
 ### Fabric Lakehouse without a schema
@@ -95,7 +124,7 @@ let the schema fill that slot:
 ```yaml
       schema: Tables
-      root_path: "abfss://<ws>@onelake.dfs.fabric.microsoft.com/<lh>.Lakehouse"
+      root_path: "abfss://<workspace_id>@onelake.dfs.fabric.microsoft.com/<lakehouse_id>"
 ```
 Since models are written to `<root_path>/<schema>/<model>`, this lands them at
@@ -309,7 +338,7 @@ unchanged since the call, else raises `CommitFailedError`.
 ```python
 import duckrun
-conn = duckrun.connect("abfss://ws@onelake.dfs.fabric.microsoft.com/lh.Lakehouse/Tables/dbo")
+conn = duckrun.connect("abfss://<workspace_id>@onelake.dfs.fabric.microsoft.com/<lakehouse_id>/Tables/dbo")
 conn.sql("select * from orders").write.mode("overwrite").saveAsTable("orders_copy")
 conn.table("orders_copy").show()
@@ -355,6 +384,32 @@ None of this is required to use duckrun — `pip install duckrun` is unaffected.
 runs the official suite (above); `tests/correctness/` proves the concurrency guarantees. The cards
 in those docs are rendered live by CI, so they always reflect the latest `main`.
+## Limitations
+These are core design trade-offs, not bugs — they're inherent to gluing DuckDB to delta_rs and
+won't be "fixed" away:
+- **A single dbt run is single-threaded — but concurrency works fine.** This is purely a dbt-adapter
+  implementation detail: *within one dbt process* models run with `threads: 1`, because the
+  in-process delta_rs write path isn't thread-safe (parallel writes to a table in the *same* process
+  collide). It is **not** a limit on concurrent writers. Multiple independent writers — separate dbt
+  runs, notebooks, jobs, whatever — writing the same tables at the same time is fully supported and
+  safe: every write uses optimistic concurrency (snapshot-pinned MERGE, `safeappend` compare-and-swap,
+  fail-loud on a conflicting commit). So you can absolutely run many writers in parallel; you just
+  can't multi-thread the models *inside a single* dbt invocation.
+- **Two engines share one machine's memory.** DuckDB executes the SQL and delta_rs materializes the
+  Delta table — two separate memory systems in the same process, each with its own pool. Under heavy
+  memory pressure (large merges especially) the budget has to be split between them, and getting that
+  split right is fragile: delta_rs's merge spill-to-disk is itself flaky, and coordinating two
+  systems that don't know about each other's allocations is the hard, unavoidable part of this design.
+- **`DROP TABLE` is a soft tombstone, not a physical delete.** delta_rs has no `DROP`, and removing the
+  Delta files directly would be a filesystem hack that fails on object stores — so `conn.sql("drop
+  table x")` overwrites the table with a one-column tombstone marker and unregisters it. The table
+  vanishes from `conn.catalog` and discovery, and a later `create table x as …` revives the path with
+  real data, but the **files are not reclaimed** (a human purges them). One consequence: reading the
+  path *directly* (`conn.read.delta("…/x")`) bypasses discovery and returns the one-row tombstone
+  marker rather than erroring — address dropped tables by name, not by path.
 ## License
 MIT

{duckrun-0.3.17.dev7 → duckrun-0.3.19}/README.md RENAMED Viewed

@@ -6,9 +6,13 @@
 > not affiliated with, endorsed by, or supported by any employer or vendor. No warranty —
 > use it at your own risk.
-**duckrun** is a [dbt](https://www.getdbt.com/) adapter that runs your model SQL in
-**DuckDB** and writes the results to **Delta Lake** using
-[`delta_rs`](https://delta-io.github.io/delta-rs/) (the `deltalake` Python package).
+**duckrun** runs SQL in [DuckDB](https://duckdb.org/) and writes
+[**Delta Lake**](https://delta-io.github.io/delta-rs/) via delta_rs. It gives you:
+- a [**dbt**](https://www.getdbt.com/) adapter that materializes models as Delta tables;
+- a **`connect()`** helper to write Delta straight from SQL in a notebook;
+- **full snapshot isolation** from read to write — concurrent writers fail loud, never interleave.
 duckrun itself is just glue — it owns none of the heavy lifting. The real work is done
 by **DuckDB** (executes the SQL), **delta-rs** (writes the Delta table), **Arrow** (the
 zero-copy (kind of) bridge that hands query results from DuckDB to delta-rs), and **dbt** (orchestrates
@@ -38,6 +42,21 @@ pip install duckrun
 That single install pulls in `dbt-duckdb` (and therefore `duckdb`) plus `deltalake`.
+### In a Microsoft Fabric Python notebook
+duckrun needs `duckdb` ≥ 1.5.4 — the release where `delta_scan` gained its `version => N`
+parameter, which duckrun uses for snapshot-pinned reads. Fabric notebooks ship a **stable**
+`duckdb` release, which trails the newest one, so the `duckdb` already loaded in the kernel may
+predate 1.5.4. Upgrade, then restart the Python kernel so the new version loads.
+```python
+!pip install duckrun --upgrade
+notebookutils.session.restartPython()
+```
+If you skip the restart, duckrun fails loud at `connect()` (and on `dbt run`) and tells you to
+restart — it won't quietly run on the older `duckdb`/`deltalake` still bound in the kernel.
 ## Configure your profile
 ```yaml
@@ -50,12 +69,22 @@ my_project:
       # No `threads:` needed — duckrun always runs single-threaded.
       # DuckDB runs in-memory by default — the Delta tables are the only state.
       # Default Delta location for models that don't set config(location=...).
-      root_path: './warehouse'   # local path, or s3://..., gs://..., abfss://...
+      # OneLake — address by GUID, not friendly names (see "OneLake: use GUID paths" below):
+      root_path: "abfss://<workspace_id>@onelake.dfs.fabric.microsoft.com/<lakehouse_id>/Tables"
+      # Or any other store: './warehouse' (local), 's3://...', 'gs://...'.
       # storage_options: {}      # passed through to deltalake for remote stores
 ```
 Persisted models are written to `<root_path>/<schema>/<model>` (e.g.
-`./warehouse/dbo/orders`), or to an explicit `config(location=...)`.
+`.../Tables/dbo/orders`), or to an explicit `config(location=...)`.
+### OneLake: use GUID paths for now
+Address OneLake tables by **workspace GUID + lakehouse GUID**, not friendly names —
+`abfss://<workspace_id>@onelake.dfs.fabric.microsoft.com/<lakehouse_id>/Tables/...`. This
+sidesteps an upstream `duckdb-delta` read bug ("No files in log segment") that is **already fixed
+upstream but still rolling out to production OneLake**. Friendly-name paths will work again once
+the fix finishes deploying.
 ### Fabric Lakehouse without a schema
@@ -66,7 +95,7 @@ let the schema fill that slot:
 ```yaml
       schema: Tables
-      root_path: "abfss://<ws>@onelake.dfs.fabric.microsoft.com/<lh>.Lakehouse"
+      root_path: "abfss://<workspace_id>@onelake.dfs.fabric.microsoft.com/<lakehouse_id>"
 ```
 Since models are written to `<root_path>/<schema>/<model>`, this lands them at
@@ -280,7 +309,7 @@ unchanged since the call, else raises `CommitFailedError`.
 ```python
 import duckrun
-conn = duckrun.connect("abfss://ws@onelake.dfs.fabric.microsoft.com/lh.Lakehouse/Tables/dbo")
+conn = duckrun.connect("abfss://<workspace_id>@onelake.dfs.fabric.microsoft.com/<lakehouse_id>/Tables/dbo")
 conn.sql("select * from orders").write.mode("overwrite").saveAsTable("orders_copy")
 conn.table("orders_copy").show()
@@ -326,6 +355,32 @@ None of this is required to use duckrun — `pip install duckrun` is unaffected.
 runs the official suite (above); `tests/correctness/` proves the concurrency guarantees. The cards
 in those docs are rendered live by CI, so they always reflect the latest `main`.
+## Limitations
+These are core design trade-offs, not bugs — they're inherent to gluing DuckDB to delta_rs and
+won't be "fixed" away:
+- **A single dbt run is single-threaded — but concurrency works fine.** This is purely a dbt-adapter
+  implementation detail: *within one dbt process* models run with `threads: 1`, because the
+  in-process delta_rs write path isn't thread-safe (parallel writes to a table in the *same* process
+  collide). It is **not** a limit on concurrent writers. Multiple independent writers — separate dbt
+  runs, notebooks, jobs, whatever — writing the same tables at the same time is fully supported and
+  safe: every write uses optimistic concurrency (snapshot-pinned MERGE, `safeappend` compare-and-swap,
+  fail-loud on a conflicting commit). So you can absolutely run many writers in parallel; you just
+  can't multi-thread the models *inside a single* dbt invocation.
+- **Two engines share one machine's memory.** DuckDB executes the SQL and delta_rs materializes the
+  Delta table — two separate memory systems in the same process, each with its own pool. Under heavy
+  memory pressure (large merges especially) the budget has to be split between them, and getting that
+  split right is fragile: delta_rs's merge spill-to-disk is itself flaky, and coordinating two
+  systems that don't know about each other's allocations is the hard, unavoidable part of this design.
+- **`DROP TABLE` is a soft tombstone, not a physical delete.** delta_rs has no `DROP`, and removing the
+  Delta files directly would be a filesystem hack that fails on object stores — so `conn.sql("drop
+  table x")` overwrites the table with a one-column tombstone marker and unregisters it. The table
+  vanishes from `conn.catalog` and discovery, and a later `create table x as …` revives the path with
+  real data, but the **files are not reclaimed** (a human purges them). One consequence: reading the
+  path *directly* (`conn.read.delta("…/x")`) bypasses discovery and returns the one-row tombstone
+  marker rather than erroring — address dropped tables by name, not by path.
 ## License
 MIT

duckrun-0.3.19/dbt/adapters/duckrun/__version__.py ADDED Viewed

	@@ -0,0 +1 @@
1	+ version = "0.3.19"

{duckrun-0.3.17.dev7 → duckrun-0.3.19}/dbt/adapters/duckrun/delta_dml.py RENAMED Viewed

@@ -129,6 +129,11 @@ _CREATE_TEMP_RE = re.compile(r"\s*create\s+(?:or\s+replace\s+)?(?:temp|temporary
 # verb would match inside an identifier (e.g. `update` within `last_update`).
 _LEADING_WITH = re.compile(r"\s*with\b", re.I)
 _DRIVING_DML = re.compile(r"\b(?:insert\s+into|update|delete\s+from)\b", re.I)
+# DuckDB numeric type names (DECIMAL(p,s) matches on the prefix). Used to scope the lossy-narrowing
+# guard to numeric→numeric casts only, leaving the intentional timestamp/string alignment untouched.
+_NUMERIC_TYPE_RE = re.compile(
+    r"^(?:TINYINT|SMALLINT|INTEGER|BIGINT|HUGEINT|UTINYINT|USMALLINT|UINTEGER|UBIGINT|UHUGEINT|"
+    r"FLOAT|REAL|DOUBLE|DECIMAL)\b", re.I)
 def _strip_leading(query: str) -> str:
@@ -382,10 +387,10 @@ class _DeltaDML:
         if self._with_clause:  # `WITH … INSERT INTO t SELECT …`: re-attach the CTE to the body
             body = f"{self._with_clause} {body}"
         cols = m.group("cols")
-        if cols:  # `insert into t (a, b) select …` → project the query onto the named columns
-            self._append_projected(loc, self._provided(cols), f"({body})")
-        else:  # column count/order already matches the target → append as-is
-            engine.write_delta(loc, self.cursor.sql(body), "append", storage_options=self.so)
+        # Always project onto the target schema — a column list maps by name, no list maps
+        # positionally. Routing both through _append_projected gives one place for the intentional
+        # type alignment AND the lossy-numeric-narrowing guard (so `insert … select 3.9` is caught too).
+        self._append_projected(loc, self._provided(cols) if cols else None, f"({body})")
     def _insert_values(self, m, rel, schema, loc) -> None:
         # `insert into <rel> [(<cols>)] values (...)`: the literals supply every target column when
@@ -420,6 +425,7 @@ class _DeltaDML:
         quoted = ", ".join('"' + c + '"' for c in provided)
         inner = f"{derived} v({quoted})"
+        self._reject_lossy_numeric_narrowing(inner, provided, dict(zip(target_cols, target_types)))
         exprs = [
             f'cast(v."{col}" as {typ}) as "{col}"' if col in provided_set
             else f'cast(null as {typ}) as "{col}"'
@@ -428,6 +434,43 @@ class _DeltaDML:
         data = self.cursor.sql(f"select {', '.join(exprs)} from {inner}")
         engine.write_delta(loc, data, "append", storage_options=self.so)
+    def _reject_lossy_numeric_narrowing(self, inner: str, provided, ttype) -> None:
+        """Fail loud when a supplied numeric value would be SILENTLY changed by the cast onto its
+        target column — e.g. inserting 3.9 into an INTEGER column (which lands 4). The cast in
+        :meth:`_append_projected` aligns types ON PURPOSE — timestamp ntz, int widening — and those are
+        lossless and intended, so this guard only fires for a numeric→numeric cast where the value does
+        NOT survive a round-trip through the target type. Non-numeric casts (timestamps, strings) are
+        deliberately left untouched. Raises ``ValueError`` naming the column and an example value.
+        Costs one extra evaluation of ``inner`` (trivial for VALUES; a second scan for ``insert …
+        select`` — acceptable to turn silent corruption into a loud error)."""
+        src = self.cursor.sql(
+            "select " + ", ".join(f'v."{c}"' for c in provided) + f" from {inner} limit 0")
+        stype = {c: str(t) for c, t in zip(provided, src.types)}
+        checks = []  # (col, lossy-predicate) for numeric→numeric casts that could narrow
+        for col in provided:
+            s, t = stype[col], ttype[col]
+            if s == t or not (_NUMERIC_TYPE_RE.match(s) and _NUMERIC_TYPE_RE.match(t)):
+                continue
+            # round-trip through the target type; try_cast so the probe itself never throws — an
+            # out-of-range value becomes NULL → distinct → flagged, same as a fractional loss.
+            checks.append(
+                (col, f'try_cast(try_cast(v."{col}" as {t}) as {s}) is distinct from v."{col}"'))
+        if not checks:
+            return
+        sel = ", ".join(
+            f'count(*) filter (where {pred}) as "n{i}", '
+            f'any_value(v."{col}") filter (where {pred}) as "ex{i}"'
+            for i, (col, pred) in enumerate(checks))
+        row = self.cursor.sql(f"select {sel} from {inner}").fetchone()
+        for i, (col, _) in enumerate(checks):
+            n, ex = row[2 * i], row[2 * i + 1]
+            if n:
+                raise ValueError(
+                    f"INSERT would silently narrow {n} value(s) for column '{col}' into "
+                    f"{ttype[col]} (e.g. {ex!r}). Cast explicitly in the SELECT/VALUES if intended."
+                )
     def _alter_add(self, m, rel, schema, loc) -> None:
         col = m.group("col").strip().strip('"')
         # Keep only the column type (drop any DEFAULT/NULL clause); add it as an all-null column by

{duckrun-0.3.17.dev7 → duckrun-0.3.19}/dbt/adapters/duckrun/impl.py RENAMED Viewed

@@ -32,6 +32,10 @@ class DuckrunConnectionManager(DuckDBConnectionManager):
     @classmethod
     def open(cls, connection):
+        # Fail loud if the kernel still has Fabric's stale duckdb/deltalake loaded (installed an
+        # upgrade but skipped notebookutils.session.restartPython()). Lazy import: same wheel.
+        from duckrun._runtime import check_runtime_versions
+        check_runtime_versions()
         # duckrun runs single-threaded, so it uses ONE DuckDB connection for the whole run
         # (DuckrunEnvironment) instead of dbt-duckdb's per-handle cursors — see environment.py.
         # Pre-seed the base class's singleton _ENV with it for the local case; remote/MotherDuck

duckrun-0.3.19/duckrun/_runtime.py ADDED Viewed

@@ -0,0 +1,51 @@
+"""Runtime version guardrail.
+duckrun needs ``duckdb`` >= 1.5.4 — the release where ``delta_scan`` gained its ``version => N``
+parameter (used for snapshot-pinned reads) — and ``deltalake`` >= 1.5.0 (for the merge
+``max_spill_size`` cap). A Microsoft Fabric Python notebook ships a *stable* ``duckdb`` release,
+which trails the newest one, so the ``duckdb`` already imported in the kernel may predate 1.5.4.
+``pip install duckrun --upgrade`` writes the new wheels to disk, but the already-loaded modules stay
+bound until the kernel restarts — so a user who skips the restart would keep running on the older
+modules, quietly losing snapshot-pinned reads and the spill cap.
+This check turns that into a loud, actionable error. It inspects the *loaded* versions (not the
+pin), so it fires exactly on the forgot-to-restart case.
+"""
+from packaging.version import Version
+# Floors duckrun needs at *runtime* — keep in sync with the pins in pyproject.toml:
+#   duckdb 1.5.4    -> delta_scan('...', version => N) for snapshot-pinned incremental reads
+#   deltalake 1.5.0 -> max_spill_size on MERGE to cap merge RAM and avoid OOM on large upserts
+_MIN_DUCKDB = "1.5.4"
+_MIN_DELTALAKE = "1.5.0"
+_REMEDY = (
+    "In a Fabric Python notebook, upgrade then restart the kernel so the new versions load:\n"
+    "    !pip install duckrun --upgrade\n"
+    "    notebookutils.session.restartPython()\n"
+    "then re-run. (Elsewhere: pip install -U 'duckdb>={duckdb}' 'deltalake>={deltalake}' and "
+    "restart the interpreter.)"
+).format(duckdb=_MIN_DUCKDB, deltalake=_MIN_DELTALAKE)
+def check_runtime_versions():
+    """Raise ``RuntimeError`` if the *loaded* duckdb/deltalake are older than duckrun requires.
+    Catches the notebook "installed but forgot ``restartPython()``" case: the kernel keeps the
+    older duckdb/deltalake bound until restart. Idempotent and cheap; called at each entry point
+    (``duckrun.connect()`` and the dbt connection open).
+    """
+    import duckdb
+    import deltalake
+    too_old = []
+    if Version(duckdb.__version__) < Version(_MIN_DUCKDB):
+        too_old.append(f"duckdb {duckdb.__version__} (need >= {_MIN_DUCKDB})")
+    if Version(deltalake.__version__) < Version(_MIN_DELTALAKE):
+        too_old.append(f"deltalake {deltalake.__version__} (need >= {_MIN_DELTALAKE})")
+    if too_old:
+        raise RuntimeError(
+            "duckrun needs a newer " + " and ".join(too_old) + " than the kernel has loaded.\n"
+            + _REMEDY
+        )

{duckrun-0.3.17.dev7 → duckrun-0.3.19}/duckrun/session.py RENAMED Viewed

@@ -16,6 +16,7 @@ import duckdb
 from dbt.adapters.duckrun import delta_dml, engine, remote, secret
 from . import auth
+from ._runtime import check_runtime_versions
 # Statements that would WRITE to a table — rejected by the read-only conn.sql() with a pointer to
@@ -25,6 +26,8 @@ from . import auth
 # TEMP/TEMPORARY TABLE and CREATE VIEW are DuckDB-local scratch by design and pass through.
 _WRITE_KEYWORD_RE = re.compile(r"^(insert|update|delete|merge)\b", re.IGNORECASE)
 _CREATE_TABLE_RE = re.compile(r"^create\s+(or\s+replace\s+)?table\b", re.IGNORECASE)
+_DML_TARGET_RE = re.compile(
+    r"^(?:insert\s+into|delete\s+from|update)\s+(?P<rel>\"?[\w.]+\"?)", re.IGNORECASE)
 _CREATE_TEMP_RE = re.compile(r"^create\s+(or\s+replace\s+)?(temp|temporary)\b", re.IGNORECASE)
 # DML forms that genuinely can't be expressed through delta_rs (delta_dml.handle never applies them):
@@ -95,6 +98,34 @@ def _is_delta_write(query: str) -> bool:
     return bool(_CREATE_TABLE_RE.match(s)) and not _CREATE_TEMP_RE.match(s)
+def _delta_write_message(query: str) -> str:
+    """The error for a raw-SQL write conn.sql() can't route to delta_rs. For an INSERT/UPDATE/DELETE
+    whose target isn't a discovered Delta table — the common cause being a typo or a table written
+    out-of-band before refresh() — name the table and give form-appropriate guidance, instead of the
+    generic 'use the Spark write API' redirect (which misdirects: for UPDATE/DELETE the problem is the
+    missing table, not the API)."""
+    s = _strip_leading(query)
+    m = _DML_TARGET_RE.match(s)
+    if m:
+        rel = m.group("rel").strip('"')
+        verb = s.split(None, 1)[0].lower()
+        if verb in ("update", "delete"):
+            return (
+                f"conn.sql(): no Delta table '{rel}' to {verb}. conn.sql() DML only targets a "
+                f"discovered Delta table — check the name, or call conn.refresh() if it was just "
+                f"written out-of-band."
+            )
+        return (  # insert into a table that doesn't exist yet
+            f"conn.sql(): no Delta table '{rel}' to insert into. Create it first with "
+            f"df.write.saveAsTable('{rel}'), then insert."
+        )
+    return (  # a CREATE … AS that didn't resolve, or any other unrouted Delta write
+        "conn.sql() can't write a Delta table from raw SQL here. "
+        "Use the Spark write API: df.write.saveAsTable(...) to create/append, or "
+        "conn.delta_table(name).merge(...)/.delete()/.update()/.replaceWhere()."
+    )
 def _qid(name: str) -> str:
     """Quote a SQL identifier (schema/table/view name)."""
     return '"' + str(name).replace('"', '""') + '"'
@@ -105,6 +136,40 @@ def _qlit(text: str) -> str:
     return str(text).replace("'", "''")
+def _strip_query_context(msg: str) -> str:
+    """DuckDB appends the offending statement to errors as ``\\nLINE N: <sql>\\n   ^``. When that
+    statement is one duckrun generated internally (the ``delta_scan`` view), echoing it back is
+    noise that makes the failure look like it's about the caller's input. Keep the real error
+    text; drop the generated-SQL context."""
+    idx = msg.find("\nLINE ")
+    return msg[:idx].rstrip() if idx != -1 else msg
+_GUID = re.compile(r"^[0-9a-fA-F]{8}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{12}$")
+def _onelake_guid_hint(root_path: str) -> Optional[str]:
+    """Workaround note for the OneLake ``delta_scan`` bug, shown only when a friendly-name
+    ``abfss://`` path is involved. OneLake's delta_scan can fail to enumerate a valid table's
+    ``_delta_log`` when the path uses friendly workspace/lakehouse names (duckdb-delta#307); the
+    GUID form reads fine. Returns ``None`` for non-abfss paths or paths already using GUIDs (no
+    point nagging those)."""
+    if not remote.is_abfss(root_path):
+        return None
+    workspace, _host, path = remote._parse_abfss(root_path)
+    lakehouse = path.split("/", 1)[0] if path else ""
+    if lakehouse.lower().endswith(".lakehouse"):
+        lakehouse = lakehouse[: -len(".Lakehouse")]
+    if _GUID.match(workspace) and _GUID.match(lakehouse):
+        return None
+    return (
+        "OneLake's delta_scan can fail to read a valid table's _delta_log when the abfss path uses "
+        "friendly names — a known upstream issue (duckdb-delta#307). Until it's fixed, use the "
+        "workspace and lakehouse GUIDs, e.g. "
+        "abfss://<workspace-guid>@onelake.dfs.fabric.microsoft.com/<lakehouse-guid>/Tables"
+    )
 def _split_root_schema(path: str, schema: Optional[str]):
     """Normalize ``path`` into ``(root_path, schema)``.
@@ -227,10 +292,22 @@ class DuckSession:
     def _register_view(self, schema: str, table: str):
         path = f"{self.root_path.rstrip('/')}/{schema}/{table}"
-        self.con.execute(
-            f"CREATE OR REPLACE VIEW {_qid(schema)}.{_qid(table)} AS "
-            f"SELECT * FROM delta_scan('{_qlit(path)}')"
-        )
+        try:
+            self.con.execute(
+                f"CREATE OR REPLACE VIEW {_qid(schema)}.{_qid(table)} AS "
+                f"SELECT * FROM delta_scan('{_qlit(path)}')"
+            )
+        except Exception as exc:
+            # delta_scan failed reading the table. Keep the real engine error (it's the signal —
+            # e.g. the OneLake "No files in log segment" delta-kernel bug), but drop DuckDB's echo
+            # of the CREATE VIEW statement *we* generated, and say which table/path it was. Suppress
+            # the chained original (`from None`) so the noisy SQL echo doesn't reappear in tracebacks.
+            hint = _onelake_guid_hint(self.root_path)
+            raise RuntimeError(
+                f"duckrun: could not read Delta table {schema}.{table} at '{path}':\n"
+                f"{_strip_query_context(str(exc))}"
+                + (f"\n\n{hint}" if hint else "")
+            ) from None
     def _set_search_path(self, schema: str):
         try:
@@ -275,11 +352,7 @@ class DuckSession:
             self.refresh(quiet=True)
             return DataFrame(self.con.sql("SELECT 'ok' AS status"), self)
         if _is_delta_write(query):
-            raise ValueError(
-                "conn.sql() can't write a Delta table from raw SQL here. "
-                "Use the Spark write API: df.write.saveAsTable(...) to create/append, or "
-                "conn.delta_table(name).merge(...)/.delete()/.update()/.replaceWhere()."
-            )
+            raise ValueError(_delta_write_message(query))
         return DataFrame(self.con.sql(query), self)
     def table(self, name: str) -> "DataFrame":
@@ -436,17 +509,17 @@ class DataFrameWriter:
         self._partition_by = list(cols)
         return self
-    def saveAsTable(self, name: str) -> str:
+    def _write(self, path: str, descr: str) -> None:
+        """Apply the configured mode to the Delta table at ``path`` (storage-neutral). ``descr``
+        names the target in the mode='error' message. Shared by saveAsTable and save."""
         session = self._df.session
-        schema, table = session.resolve(name)
-        path = session.table_path(schema, table)
         so = session.storage_options
         mode = self._mode
         if mode in ("error", "errorifexists"):
             if engine.table_exists(path, so):
                 raise ValueError(
-                    f"table '{schema}.{table}' already exists (mode='error'). "
+                    f"{descr} already exists (mode='error'). "
                     f"Use mode('overwrite'), mode('append'), mode('safeappend'), or mode('ignore')."
                 )
             mode = "overwrite"
@@ -487,6 +560,22 @@ class DataFrameWriter:
                 storage_options=so,
                 compaction_threshold=session.compaction_threshold,
             )
+    def save(self, path: str) -> str:
+        """Spark ``df.write.save(path)`` — write to a Delta table by PATH, not catalog name.
+        Storage-neutral (local / s3:// / gs:// / az:// / abfss://). Unlike :meth:`saveAsTable`,
+        the result is addressed only by ``path`` — there is no schema.table name to register a
+        view for — so it is read back with ``conn.read.delta(path)`` / ``delta_scan('<path>')``,
+        not as an unqualified table. Returns ``path``."""
+        self._write(path, f"delta table at '{path}'")
+        return path
+    def saveAsTable(self, name: str) -> str:
+        session = self._df.session
+        schema, table = session.resolve(name)
+        path = session.table_path(schema, table)
+        self._write(path, f"table '{schema}.{table}'")
         # Surface the (new or grown) table immediately — no manual refresh() needed.
         session.con.execute(f"CREATE SCHEMA IF NOT EXISTS {_qid(schema)}")
         session._register_view(schema, table)
@@ -573,4 +662,5 @@ def connect(path: str, storage_options: Optional[Dict[str, str]] = None,
         >>> conn.sql("SHOW TABLES").show()
         >>> conn.sql("select * from orders").write.mode("overwrite").saveAsTable("orders_copy")
     """
+    check_runtime_versions()  # fail loud if Fabric's stale duckdb/deltalake are still loaded
     return DuckSession(path, storage_options, schema, compaction_threshold)

{duckrun-0.3.17.dev7 → duckrun-0.3.19/duckrun.egg-info}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: duckrun
-Version: 0.3.17.dev7
+Version: 0.3.19
 Summary: A dbt adapter that runs SQL in DuckDB and materializes to Delta Lake (delta_rs).
 Author: mim
 License: MIT
@@ -12,7 +12,7 @@ Description-Content-Type: text/markdown
 License-File: LICENSE
 Requires-Dist: dbt-duckdb>=1.8
 Requires-Dist: dbt-core<2.0,>=1.8
-Requires-Dist: duckdb==1.5.4
+Requires-Dist: duckdb<1.6.0,>=1.5.4
 Requires-Dist: deltalake<1.5.1,>=1.5.0
 Requires-Dist: requests
 Provides-Extra: local
@@ -35,9 +35,13 @@ Dynamic: license-file
 > not affiliated with, endorsed by, or supported by any employer or vendor. No warranty —
 > use it at your own risk.
-**duckrun** is a [dbt](https://www.getdbt.com/) adapter that runs your model SQL in
-**DuckDB** and writes the results to **Delta Lake** using
-[`delta_rs`](https://delta-io.github.io/delta-rs/) (the `deltalake` Python package).
+**duckrun** runs SQL in [DuckDB](https://duckdb.org/) and writes
+[**Delta Lake**](https://delta-io.github.io/delta-rs/) via delta_rs. It gives you:
+- a [**dbt**](https://www.getdbt.com/) adapter that materializes models as Delta tables;
+- a **`connect()`** helper to write Delta straight from SQL in a notebook;
+- **full snapshot isolation** from read to write — concurrent writers fail loud, never interleave.
 duckrun itself is just glue — it owns none of the heavy lifting. The real work is done
 by **DuckDB** (executes the SQL), **delta-rs** (writes the Delta table), **Arrow** (the
 zero-copy (kind of) bridge that hands query results from DuckDB to delta-rs), and **dbt** (orchestrates
@@ -67,6 +71,21 @@ pip install duckrun
 That single install pulls in `dbt-duckdb` (and therefore `duckdb`) plus `deltalake`.
+### In a Microsoft Fabric Python notebook
+duckrun needs `duckdb` ≥ 1.5.4 — the release where `delta_scan` gained its `version => N`
+parameter, which duckrun uses for snapshot-pinned reads. Fabric notebooks ship a **stable**
+`duckdb` release, which trails the newest one, so the `duckdb` already loaded in the kernel may
+predate 1.5.4. Upgrade, then restart the Python kernel so the new version loads.
+```python
+!pip install duckrun --upgrade
+notebookutils.session.restartPython()
+```
+If you skip the restart, duckrun fails loud at `connect()` (and on `dbt run`) and tells you to
+restart — it won't quietly run on the older `duckdb`/`deltalake` still bound in the kernel.
 ## Configure your profile
 ```yaml
@@ -79,12 +98,22 @@ my_project:
       # No `threads:` needed — duckrun always runs single-threaded.
       # DuckDB runs in-memory by default — the Delta tables are the only state.
       # Default Delta location for models that don't set config(location=...).
-      root_path: './warehouse'   # local path, or s3://..., gs://..., abfss://...
+      # OneLake — address by GUID, not friendly names (see "OneLake: use GUID paths" below):
+      root_path: "abfss://<workspace_id>@onelake.dfs.fabric.microsoft.com/<lakehouse_id>/Tables"
+      # Or any other store: './warehouse' (local), 's3://...', 'gs://...'.
       # storage_options: {}      # passed through to deltalake for remote stores
 ```
 Persisted models are written to `<root_path>/<schema>/<model>` (e.g.
-`./warehouse/dbo/orders`), or to an explicit `config(location=...)`.
+`.../Tables/dbo/orders`), or to an explicit `config(location=...)`.
+### OneLake: use GUID paths for now
+Address OneLake tables by **workspace GUID + lakehouse GUID**, not friendly names —
+`abfss://<workspace_id>@onelake.dfs.fabric.microsoft.com/<lakehouse_id>/Tables/...`. This
+sidesteps an upstream `duckdb-delta` read bug ("No files in log segment") that is **already fixed
+upstream but still rolling out to production OneLake**. Friendly-name paths will work again once
+the fix finishes deploying.
 ### Fabric Lakehouse without a schema
@@ -95,7 +124,7 @@ let the schema fill that slot:
 ```yaml
       schema: Tables
-      root_path: "abfss://<ws>@onelake.dfs.fabric.microsoft.com/<lh>.Lakehouse"
+      root_path: "abfss://<workspace_id>@onelake.dfs.fabric.microsoft.com/<lakehouse_id>"
 ```
 Since models are written to `<root_path>/<schema>/<model>`, this lands them at
@@ -309,7 +338,7 @@ unchanged since the call, else raises `CommitFailedError`.
 ```python
 import duckrun
-conn = duckrun.connect("abfss://ws@onelake.dfs.fabric.microsoft.com/lh.Lakehouse/Tables/dbo")
+conn = duckrun.connect("abfss://<workspace_id>@onelake.dfs.fabric.microsoft.com/<lakehouse_id>/Tables/dbo")
 conn.sql("select * from orders").write.mode("overwrite").saveAsTable("orders_copy")
 conn.table("orders_copy").show()
@@ -355,6 +384,32 @@ None of this is required to use duckrun — `pip install duckrun` is unaffected.
 runs the official suite (above); `tests/correctness/` proves the concurrency guarantees. The cards
 in those docs are rendered live by CI, so they always reflect the latest `main`.
+## Limitations
+These are core design trade-offs, not bugs — they're inherent to gluing DuckDB to delta_rs and
+won't be "fixed" away:
+- **A single dbt run is single-threaded — but concurrency works fine.** This is purely a dbt-adapter
+  implementation detail: *within one dbt process* models run with `threads: 1`, because the
+  in-process delta_rs write path isn't thread-safe (parallel writes to a table in the *same* process
+  collide). It is **not** a limit on concurrent writers. Multiple independent writers — separate dbt
+  runs, notebooks, jobs, whatever — writing the same tables at the same time is fully supported and
+  safe: every write uses optimistic concurrency (snapshot-pinned MERGE, `safeappend` compare-and-swap,
+  fail-loud on a conflicting commit). So you can absolutely run many writers in parallel; you just
+  can't multi-thread the models *inside a single* dbt invocation.
+- **Two engines share one machine's memory.** DuckDB executes the SQL and delta_rs materializes the
+  Delta table — two separate memory systems in the same process, each with its own pool. Under heavy
+  memory pressure (large merges especially) the budget has to be split between them, and getting that
+  split right is fragile: delta_rs's merge spill-to-disk is itself flaky, and coordinating two
+  systems that don't know about each other's allocations is the hard, unavoidable part of this design.
+- **`DROP TABLE` is a soft tombstone, not a physical delete.** delta_rs has no `DROP`, and removing the
+  Delta files directly would be a filesystem hack that fails on object stores — so `conn.sql("drop
+  table x")` overwrites the table with a one-column tombstone marker and unregisters it. The table
+  vanishes from `conn.catalog` and discovery, and a later `create table x as …` revives the path with
+  real data, but the **files are not reclaimed** (a human purges them). One consequence: reading the
+  path *directly* (`conn.read.delta("…/x")`) bypasses discovery and returns the one-row tombstone
+  marker rather than erroring — address dropped tables by name, not by path.
 ## License
 MIT

{duckrun-0.3.17.dev7 → duckrun-0.3.19}/duckrun.egg-info/SOURCES.txt RENAMED Viewed

@@ -21,6 +21,7 @@ dbt/include/duckrun/macros/materializations/incremental.sql
 dbt/include/duckrun/macros/materializations/snapshot.sql
 dbt/include/duckrun/macros/materializations/table.sql
 duckrun/__init__.py
+duckrun/_runtime.py
 duckrun/auth.py
 duckrun/delta_table.py
 duckrun/session.py

{duckrun-0.3.17.dev7 → duckrun-0.3.19}/duckrun.egg-info/requires.txt RENAMED Viewed

@@ -1,6 +1,6 @@
 dbt-duckdb>=1.8
 dbt-core<2.0,>=1.8
-duckdb==1.5.4
+duckdb<1.6.0,>=1.5.4
 deltalake<1.5.1,>=1.5.0
 requests

{duckrun-0.3.17.dev7 → duckrun-0.3.19}/pyproject.toml RENAMED Viewed

@@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"
 [project]
 name = "duckrun"
-version = "0.3.17.dev7"
+version = "0.3.19"
 description = "A dbt adapter that runs SQL in DuckDB and materializes to Delta Lake (delta_rs)."
 readme = "README.md"
 license = {text = "MIT"}
@@ -21,23 +21,22 @@ dependencies = [
     # fails at `from dbt.cli.main import dbtRunner`. Declared directly because we only depend on
     # dbt-core transitively, so the ceiling has to live here to bite.
     "dbt-core>=1.8,<2.0",
-    # Pin exactly 1.5.4 (stable). NOT a bare ">=1.5.4" floor: this project resolves with pip --pre
-    # (to pick up dbt pre-releases), and --pre is global, so an open floor lets pip pull the newest
-    # duckdb *prerelease* — verified: "--pre duckdb>=1.5.4" resolves to 1.6.0.dev12, an unstable,
-    # untested engine. An upper cap (<1.6.0) blocks 1.6.0.dev but a future 1.5.5.devN would still
-    # slip through under --pre, so pin exactly, matching the exact deltalake pin below.
-    # 1.5.4 is the first stable build whose bundled duckdb-delta
-    # extension supports `delta_scan('...', version => N)` (duckdb-delta #312) — the version-pinned
-    # read this project now relies on to make the incremental read and the write commit resolve at
-    # ONE Delta snapshot (Spark single-snapshot MERGE parity; see the staging-read pin in
-    # _delta_core.sql and merge_delta's read_version). The earlier 1.5.2+ "No files in log segment"
-    # read regression is avoided by addressing OneLake tables via GUID (workspace_id/lakehouse_id)
-    # abfss paths, and is fixed upstream.
+    # duckdb floor is 1.5.4 with a <1.6.0 cap (a floor, NOT an exact pin): 1.5.4 is the first
+    # stable build whose bundled duckdb-delta extension supports `delta_scan('...', version => N)`
+    # (duckdb-delta #312) — the version-pinned read this project relies on to make the incremental
+    # read and the write commit resolve at ONE Delta snapshot (Spark single-snapshot MERGE parity;
+    # see the staging-read pin in _delta_core.sql and merge_delta's read_version). Stable 1.5.x
+    # patches above the floor are fine. The <1.6.0 cap matters because this project resolves with
+    # pip --pre (to pick up dbt pre-releases) and --pre is global: an open floor would let pip pull
+    # an unstable duckdb *prerelease* (verified: "--pre duckdb>=1.5.4" resolves to 1.6.0.dev12).
+    # Per PEP 440, "<1.6.0" excludes 1.6.0 AND its prereleases (1.6.0.devN), so no dev build slips
+    # in. The earlier 1.5.2+ "No files in log segment" read regression is avoided by addressing
+    # OneLake tables via GUID (workspace_id/lakehouse_id) abfss paths, and is fixed upstream.
     # deltalake floor stays 1.5.0 (not just a ceiling): 1.5.0 is the first release with MERGE
     # disk-spill config (max_spill_size), which engine.merge_delta relies on to cap the merge's RAM
     # and avoid OOM on large upserts; the matching <1.5.1 ceiling avoids the deltalake delta-log
     # write-side regression, pinning exactly 1.5.0.
-    "duckdb==1.5.4",
+    "duckdb>=1.5.4,<1.6.0",
     "deltalake>=1.5.0,<1.5.1",
     # The top-level connection API (duckrun.connect) discovers OneLake tables via the DFS REST
     # API directly; requests is otherwise only a transitive dbt dependency.