PyPI - duckrun - Versions diffs - 0.2.9.dev4__tar.gz → 0.2.18.dev4__tar.gz - Mend

duckrun 0.2.9.dev4tar.gz → 0.2.18.dev4tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Potentially problematic release.

This version of duckrun might be problematic. Click here for more details.

Files changed (23) hide show

{duckrun-0.2.9.dev4 → duckrun-0.2.18.dev4}/PKG-INFO RENAMED Viewed

@@ -1,7 +1,7 @@
 Metadata-Version: 2.4
 Name: duckrun
-Version: 0.2.9.dev4
-Summary: Lakehouse task runner powered by DuckDB for Microsoft Fabric
+Version: 0.2.18.dev4
+Summary: Helper library for Fabric Python using duckdb, arrow and delta_rs (orchestration, queries, etc.)
 Author: mim
 License: MIT
 Project-URL: Homepage, https://github.com/djouallah/duckrun
@@ -20,7 +20,7 @@ Dynamic: license-file
 <img src="https://raw.githubusercontent.com/djouallah/duckrun/main/duckrun.png" width="400" alt="Duckrun">
-A helper package for stuff that made my life easier when working with Fabric Python notebooks. Just the things that actually made sense to me - nothing fancy
+A helper package for working with Microsoft Fabric lakehouses - orchestration, SQL queries, and file management powered by DuckDB.
 ## Important Notes
@@ -28,7 +28,6 @@ A helper package for stuff that made my life easier when working with Fabric Pyt
 - Lakehouse must have a schema (e.g., `dbo`, `sales`, `analytics`)
 - **Workspace names with spaces are fully supported!** ✅
 **Delta Lake Version:** This package uses an older version of deltalake to maintain row size control capabilities, which is crucial for Power BI performance optimization. The newer Rust-based deltalake versions don't yet support the row group size parameters that are essential for optimal DirectLake performance.
 ## What It Does
@@ -40,12 +39,15 @@ It does orchestration, arbitrary SQL statements, and file manipulation. That's i
 ```bash
 pip install duckrun
 ```
-for local usage, Note: When running locally, your internet speed will be the main bottleneck.
+For local usage (requires Azure CLI or interactive browser auth):
 ```bash
 pip install duckrun[local]
 ```
+Note: When running locally, your internet speed will be the main bottleneck.
 ## Quick Start
 ### Simple Example for New Users
@@ -163,9 +165,6 @@ con.sql("""
     GROUP BY customer_id
 """).write.mode("overwrite").saveAsTable("customer_totals")
-# Append mode
-con.sql("SELECT * FROM new_orders").write.mode("append").saveAsTable("orders")
 # Schema evolution and partitioning (exact Spark API compatibility)
 con.sql("""
     SELECT
@@ -324,6 +323,73 @@ pipeline = [
 ## Advanced Features
+### SQL Lookup Functions
+Duckrun automatically registers helper functions that allow you to resolve workspace and lakehouse names from GUIDs directly in SQL queries. These are especially useful when working with storage logs or audit data that contains workspace/lakehouse IDs.
+**Available Functions:**
+```python
+con = duckrun.connect("workspace/lakehouse.lakehouse/dbo")
+# ID → Name lookups (most common use case)
+con.sql("""
+    SELECT
+        workspace_id,
+        get_workspace_name(workspace_id) as workspace_name,
+        lakehouse_id,
+        get_lakehouse_name(workspace_id, lakehouse_id) as lakehouse_name
+    FROM storage_logs
+""").show()
+# Name → ID lookups (reverse)
+con.sql("""
+    SELECT
+        workspace_name,
+        get_workspace_id_from_name(workspace_name) as workspace_id,
+        lakehouse_name,
+        get_lakehouse_id_from_name(workspace_id, lakehouse_name) as lakehouse_id
+    FROM configuration_table
+""").show()
+```
+**Function Reference:**
+- `get_workspace_name(workspace_id)` - Convert workspace GUID to display name
+- `get_lakehouse_name(workspace_id, lakehouse_id)` - Convert lakehouse GUID to display name
+- `get_workspace_id_from_name(workspace_name)` - Convert workspace name to GUID
+- `get_lakehouse_id_from_name(workspace_id, lakehouse_name)` - Convert lakehouse name to GUID
+**Features:**
+- ✅ **Automatic Caching**: Results are cached to avoid repeated API calls
+- ✅ **NULL on Error**: Returns `NULL` instead of errors for missing or inaccessible items
+- ✅ **Fabric API Integration**: Resolves names using Microsoft Fabric REST API
+- ✅ **Always Available**: Functions are automatically registered on connection
+**Example Use Case:**
+```python
+# Enrich OneLake storage logs with friendly names
+con = duckrun.connect("Analytics/Monitoring.lakehouse/dbo")
+result = con.sql("""
+    SELECT
+        workspace_id,
+        get_workspace_name(workspace_id) as workspace_name,
+        lakehouse_id,
+        get_lakehouse_name(workspace_id, lakehouse_id) as lakehouse_name,
+        operation_name,
+        COUNT(*) as operation_count,
+        SUM(bytes_transferred) as total_bytes
+    FROM onelake_storage_logs
+    WHERE log_date = CURRENT_DATE
+    GROUP BY ALL
+    ORDER BY workspace_name, lakehouse_name
+""").show()
+```
+This makes it easy to create human-readable reports from GUID-based log data!
 ### Schema Evolution & Partitioning
 Handle evolving schemas and optimize query performance with partitioning:
@@ -420,6 +486,37 @@ success = con.run(pipeline)  # Returns True only if ALL tasks succeed
 This prevents downstream tasks from processing incomplete or corrupted data.
+### Semantic Model Deployment
+Deploy Power BI semantic models directly from BIM files using DirectLake mode:
+```python
+# Connect to lakehouse
+con = duckrun.connect("Analytics/Sales.lakehouse/dbo")
+# Deploy with auto-generated name (lakehouse_schema)
+con.deploy("https://raw.githubusercontent.com/user/repo/main/model.bim")
+# Deploy with custom name
+con.deploy(
+    "https://raw.githubusercontent.com/user/repo/main/sales_model.bim",
+    dataset_name="Sales Analytics Model",
+    wait_seconds=10  # Wait for permission propagation
+)
+```
+**Features:**
+- 🚀 **DirectLake Mode**: Deploys semantic models with DirectLake connection
+- 🔄 **Automatic Configuration**: Auto-configures workspace, lakehouse, and schema connections
+- 📦 **BIM from URL**: Load model definitions from GitHub or any accessible URL
+- ⏱️ **Permission Handling**: Configurable wait time for permission propagation
+**Use Cases:**
+- Deploy semantic models as part of CI/CD pipelines
+- Version control your semantic models in Git
+- Automated model deployment across environments
+- Streamline DirectLake model creation
 ### Delta Lake Optimization
 Duckrun automatically:
@@ -436,63 +533,6 @@ con = duckrun.connect(
 )
 ```
-## File Management API Reference
-### `copy(local_folder, remote_folder, file_extensions=None, overwrite=False)`
-Upload files from a local folder to OneLake Files section.
-**Parameters:**
-- `local_folder` (str): Path to local folder containing files to upload
-- `remote_folder` (str): **Required** target folder path in OneLake Files
-- `file_extensions` (list, optional): Filter by file extensions (e.g., `['.csv', '.parquet']`)
-- `overwrite` (bool, optional): Whether to overwrite existing files (default: False)
-**Returns:** `True` if all files uploaded successfully, `False` otherwise
-**Examples:**
-```python
-# Upload all files to a target folder
-con.copy("./data", "processed_data")
-# Upload only CSV and Parquet files
-con.copy("./reports", "monthly_reports", ['.csv', '.parquet'])
-# Upload with overwrite enabled
-con.copy("./backup", "daily_backup", overwrite=True)
-```
-### `download(remote_folder="", local_folder="./downloaded_files", file_extensions=None, overwrite=False)`
-Download files from OneLake Files section to a local folder.
-**Parameters:**
-- `remote_folder` (str, optional): Source folder path in OneLake Files (default: root)
-- `local_folder` (str, optional): Local destination folder (default: "./downloaded_files")
-- `file_extensions` (list, optional): Filter by file extensions (e.g., `['.csv', '.json']`)
-- `overwrite` (bool, optional): Whether to overwrite existing local files (default: False)
-**Returns:** `True` if all files downloaded successfully, `False` otherwise
-**Examples:**
-```python
-# Download all files from OneLake Files root
-con.download()
-# Download from specific folder
-con.download("processed_data", "./local_data")
-# Download only JSON files
-con.download("config", "./configs", ['.json'])
-```
-**Important Notes:**
-- Files are uploaded/downloaded to/from the **OneLake Files section**, not Delta Tables
-- The `remote_folder` parameter is **required** for uploads to prevent accidental uploads
-- Both methods default to `overwrite=False` for safety
-- Folder structure is preserved during upload/download operations
-- Progress is reported with file names, sizes, and upload/download status
 ## Complete Example
 ```python
@@ -534,6 +574,12 @@ con.sql("""
 # 5. Download processed files for external systems
 con.download("processed_reports", "./exports", ['.csv'])
+# 6. Deploy semantic model for Power BI
+con.deploy(
+    "https://raw.githubusercontent.com/user/repo/main/sales_model.bim",
+    dataset_name="Sales Analytics"
+)
 ```
 **This example demonstrates:**
@@ -541,8 +587,9 @@ con.download("processed_reports", "./exports", ['.csv'])
 - 🔄 **Pipeline orchestration** with SQL and Python tasks
 - ⚡ **Fast data exploration** with DuckDB
 - 💾 **Delta table creation** with Spark-style API
-- � **Schema evolution** and partitioning
-- �📤 **File downloads** from OneLake Files
+- 🔀 **Schema evolution** and partitioning
+- 📤 **File downloads** from OneLake Files
+- 📊 **Semantic model deployment** with DirectLake
 ## Schema Evolution & Partitioning Guide

{duckrun-0.2.9.dev4 → duckrun-0.2.18.dev4}/README.md RENAMED Viewed

@@ -1,6 +1,6 @@
 <img src="https://raw.githubusercontent.com/djouallah/duckrun/main/duckrun.png" width="400" alt="Duckrun">
-A helper package for stuff that made my life easier when working with Fabric Python notebooks. Just the things that actually made sense to me - nothing fancy
+A helper package for working with Microsoft Fabric lakehouses - orchestration, SQL queries, and file management powered by DuckDB.
 ## Important Notes
@@ -8,7 +8,6 @@ A helper package for stuff that made my life easier when working with Fabric Pyt
 - Lakehouse must have a schema (e.g., `dbo`, `sales`, `analytics`)
 - **Workspace names with spaces are fully supported!** ✅
 **Delta Lake Version:** This package uses an older version of deltalake to maintain row size control capabilities, which is crucial for Power BI performance optimization. The newer Rust-based deltalake versions don't yet support the row group size parameters that are essential for optimal DirectLake performance.
 ## What It Does
@@ -20,12 +19,15 @@ It does orchestration, arbitrary SQL statements, and file manipulation. That's i
 ```bash
 pip install duckrun
 ```
-for local usage, Note: When running locally, your internet speed will be the main bottleneck.
+For local usage (requires Azure CLI or interactive browser auth):
 ```bash
 pip install duckrun[local]
 ```
+Note: When running locally, your internet speed will be the main bottleneck.
 ## Quick Start
 ### Simple Example for New Users
@@ -143,9 +145,6 @@ con.sql("""
     GROUP BY customer_id
 """).write.mode("overwrite").saveAsTable("customer_totals")
-# Append mode
-con.sql("SELECT * FROM new_orders").write.mode("append").saveAsTable("orders")
 # Schema evolution and partitioning (exact Spark API compatibility)
 con.sql("""
     SELECT
@@ -304,6 +303,73 @@ pipeline = [
 ## Advanced Features
+### SQL Lookup Functions
+Duckrun automatically registers helper functions that allow you to resolve workspace and lakehouse names from GUIDs directly in SQL queries. These are especially useful when working with storage logs or audit data that contains workspace/lakehouse IDs.
+**Available Functions:**
+```python
+con = duckrun.connect("workspace/lakehouse.lakehouse/dbo")
+# ID → Name lookups (most common use case)
+con.sql("""
+    SELECT
+        workspace_id,
+        get_workspace_name(workspace_id) as workspace_name,
+        lakehouse_id,
+        get_lakehouse_name(workspace_id, lakehouse_id) as lakehouse_name
+    FROM storage_logs
+""").show()
+# Name → ID lookups (reverse)
+con.sql("""
+    SELECT
+        workspace_name,
+        get_workspace_id_from_name(workspace_name) as workspace_id,
+        lakehouse_name,
+        get_lakehouse_id_from_name(workspace_id, lakehouse_name) as lakehouse_id
+    FROM configuration_table
+""").show()
+```
+**Function Reference:**
+- `get_workspace_name(workspace_id)` - Convert workspace GUID to display name
+- `get_lakehouse_name(workspace_id, lakehouse_id)` - Convert lakehouse GUID to display name
+- `get_workspace_id_from_name(workspace_name)` - Convert workspace name to GUID
+- `get_lakehouse_id_from_name(workspace_id, lakehouse_name)` - Convert lakehouse name to GUID
+**Features:**
+- ✅ **Automatic Caching**: Results are cached to avoid repeated API calls
+- ✅ **NULL on Error**: Returns `NULL` instead of errors for missing or inaccessible items
+- ✅ **Fabric API Integration**: Resolves names using Microsoft Fabric REST API
+- ✅ **Always Available**: Functions are automatically registered on connection
+**Example Use Case:**
+```python
+# Enrich OneLake storage logs with friendly names
+con = duckrun.connect("Analytics/Monitoring.lakehouse/dbo")
+result = con.sql("""
+    SELECT
+        workspace_id,
+        get_workspace_name(workspace_id) as workspace_name,
+        lakehouse_id,
+        get_lakehouse_name(workspace_id, lakehouse_id) as lakehouse_name,
+        operation_name,
+        COUNT(*) as operation_count,
+        SUM(bytes_transferred) as total_bytes
+    FROM onelake_storage_logs
+    WHERE log_date = CURRENT_DATE
+    GROUP BY ALL
+    ORDER BY workspace_name, lakehouse_name
+""").show()
+```
+This makes it easy to create human-readable reports from GUID-based log data!
 ### Schema Evolution & Partitioning
 Handle evolving schemas and optimize query performance with partitioning:
@@ -400,6 +466,37 @@ success = con.run(pipeline)  # Returns True only if ALL tasks succeed
 This prevents downstream tasks from processing incomplete or corrupted data.
+### Semantic Model Deployment
+Deploy Power BI semantic models directly from BIM files using DirectLake mode:
+```python
+# Connect to lakehouse
+con = duckrun.connect("Analytics/Sales.lakehouse/dbo")
+# Deploy with auto-generated name (lakehouse_schema)
+con.deploy("https://raw.githubusercontent.com/user/repo/main/model.bim")
+# Deploy with custom name
+con.deploy(
+    "https://raw.githubusercontent.com/user/repo/main/sales_model.bim",
+    dataset_name="Sales Analytics Model",
+    wait_seconds=10  # Wait for permission propagation
+)
+```
+**Features:**
+- 🚀 **DirectLake Mode**: Deploys semantic models with DirectLake connection
+- 🔄 **Automatic Configuration**: Auto-configures workspace, lakehouse, and schema connections
+- 📦 **BIM from URL**: Load model definitions from GitHub or any accessible URL
+- ⏱️ **Permission Handling**: Configurable wait time for permission propagation
+**Use Cases:**
+- Deploy semantic models as part of CI/CD pipelines
+- Version control your semantic models in Git
+- Automated model deployment across environments
+- Streamline DirectLake model creation
 ### Delta Lake Optimization
 Duckrun automatically:
@@ -416,63 +513,6 @@ con = duckrun.connect(
 )
 ```
-## File Management API Reference
-### `copy(local_folder, remote_folder, file_extensions=None, overwrite=False)`
-Upload files from a local folder to OneLake Files section.
-**Parameters:**
-- `local_folder` (str): Path to local folder containing files to upload
-- `remote_folder` (str): **Required** target folder path in OneLake Files
-- `file_extensions` (list, optional): Filter by file extensions (e.g., `['.csv', '.parquet']`)
-- `overwrite` (bool, optional): Whether to overwrite existing files (default: False)
-**Returns:** `True` if all files uploaded successfully, `False` otherwise
-**Examples:**
-```python
-# Upload all files to a target folder
-con.copy("./data", "processed_data")
-# Upload only CSV and Parquet files
-con.copy("./reports", "monthly_reports", ['.csv', '.parquet'])
-# Upload with overwrite enabled
-con.copy("./backup", "daily_backup", overwrite=True)
-```
-### `download(remote_folder="", local_folder="./downloaded_files", file_extensions=None, overwrite=False)`
-Download files from OneLake Files section to a local folder.
-**Parameters:**
-- `remote_folder` (str, optional): Source folder path in OneLake Files (default: root)
-- `local_folder` (str, optional): Local destination folder (default: "./downloaded_files")
-- `file_extensions` (list, optional): Filter by file extensions (e.g., `['.csv', '.json']`)
-- `overwrite` (bool, optional): Whether to overwrite existing local files (default: False)
-**Returns:** `True` if all files downloaded successfully, `False` otherwise
-**Examples:**
-```python
-# Download all files from OneLake Files root
-con.download()
-# Download from specific folder
-con.download("processed_data", "./local_data")
-# Download only JSON files
-con.download("config", "./configs", ['.json'])
-```
-**Important Notes:**
-- Files are uploaded/downloaded to/from the **OneLake Files section**, not Delta Tables
-- The `remote_folder` parameter is **required** for uploads to prevent accidental uploads
-- Both methods default to `overwrite=False` for safety
-- Folder structure is preserved during upload/download operations
-- Progress is reported with file names, sizes, and upload/download status
 ## Complete Example
 ```python
@@ -514,6 +554,12 @@ con.sql("""
 # 5. Download processed files for external systems
 con.download("processed_reports", "./exports", ['.csv'])
+# 6. Deploy semantic model for Power BI
+con.deploy(
+    "https://raw.githubusercontent.com/user/repo/main/sales_model.bim",
+    dataset_name="Sales Analytics"
+)
 ```
 **This example demonstrates:**
@@ -521,8 +567,9 @@ con.download("processed_reports", "./exports", ['.csv'])
 - 🔄 **Pipeline orchestration** with SQL and Python tasks
 - ⚡ **Fast data exploration** with DuckDB
 - 💾 **Delta table creation** with Spark-style API
-- � **Schema evolution** and partitioning
-- �📤 **File downloads** from OneLake Files
+- 🔀 **Schema evolution** and partitioning
+- 📤 **File downloads** from OneLake Files
+- 📊 **Semantic model deployment** with DirectLake
 ## Schema Evolution & Partitioning Guide
@@ -592,4 +639,4 @@ For a complete production example, see [fabric_demo](https://github.com/djoualla
 ## License
-MIT
+MIT

duckrun-0.2.18.dev4/duckrun/__init__.py ADDED Viewed

@@ -0,0 +1,11 @@
+"""Duckrun - Lakehouse task runner powered by DuckDB"""
+from duckrun.core import Duckrun
+from duckrun.notebook import import_notebook_from_web, import_notebook
+__version__ = "0.2.18.dev2"
+# Expose unified connect method at module level
+connect = Duckrun.connect
+__all__ = ["Duckrun", "connect", "import_notebook_from_web", "import_notebook"]

{duckrun-0.2.9.dev4 → duckrun-0.2.18.dev4}/duckrun/auth.py RENAMED Viewed

@@ -2,9 +2,21 @@
 Enhanced authentication module for duckrun - supports multiple notebook environments
 """
 import os
+import sys
 from typing import Optional, Tuple
+def safe_print(message: str):
+    """Print message with safe encoding handling for Windows"""
+    try:
+        print(message)
+    except UnicodeEncodeError:
+        # Fallback: remove emojis and special chars
+        import re
+        clean_message = re.sub(r'[^\x00-\x7F]+', '', message)
+        print(clean_message)
 def get_token() -> Optional[str]:
     """
     Smart authentication that works across multiple environments:
@@ -20,7 +32,6 @@ def get_token() -> Optional[str]:
     # Check if we already have a cached token
     token_env = os.environ.get("AZURE_STORAGE_TOKEN")
     if token_env and token_env != "PLACEHOLDER_TOKEN_TOKEN_NOT_AVAILABLE":
-        print("✅ Using existing Azure Storage token")
         return token_env
     print("🔐 Starting Azure authentication...")
@@ -38,21 +49,16 @@ def get_token() -> Optional[str]:
     except Exception as e:
         print(f"⚠️ Fabric notebook authentication failed: {e}")
-    # Detect environment type for fallback authentication
+    # Try local/VS Code authentication (Azure CLI + browser)
+    print("🖥️ Trying local authentication (Azure CLI + browser fallback)...")
+    token = _get_local_token()
+    if token:
+        return token
+    # If local auth failed, fall back to device code flow
+    print("🔐 Falling back to device code flow for remote/headless environment...")
     try:
-        # Check if we're in Google Colab first
-        try:
-            import google.colab
-            print("🚀 Google Colab detected - using device code flow")
-            return _get_device_code_token()
-        except ImportError:
-            pass
-        # For all other environments (including VS Code), try Azure CLI first
-        # This includes local development, VS Code notebooks, etc.
-        print("🖥️ Local/VS Code environment detected - trying Azure CLI first, then browser fallback")
-        return _get_local_token()
+        return _get_device_code_token()
     except Exception as e:
         print(f"❌ Authentication failed: {e}")
         print("💡 Try refreshing and running again, or check your Azure permissions")

duckrun 0.2.9.dev4__tar.gz → 0.2.18.dev4__tar.gz

Potentially problematic release.

duckrun 0.2.9.dev4tar.gz → 0.2.18.dev4tar.gz