PyPI - duckrun - Versions diffs - 0.1.6.2__tar.gz → 0.1.7__tar.gz - Mend

duckrun 0.1.6.2tar.gz → 0.1.7tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (12) hide show

{duckrun-0.1.6.2 → duckrun-0.1.7}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: duckrun
-Version: 0.1.6.2
+Version: 0.1.7
 Summary: Lakehouse task runner powered by DuckDB for Microsoft Fabric
 Author: mim
 License: MIT
@@ -20,7 +20,7 @@ Dynamic: license-file
 <img src="https://raw.githubusercontent.com/djouallah/duckrun/main/duckrun.png" width="400" alt="Duckrun">
-Simple task runner for Microsoft Fabric Python notebooks, powered by DuckDB and Delta Lake.
+A helper package for stuff that made my life easier when working with Fabric Python notebooks. Just the things that actually made sense to me - nothing fancy
 ## Important Notes
@@ -30,6 +30,10 @@ Simple task runner for Microsoft Fabric Python notebooks, powered by DuckDB and
 **Why no spaces?** Duckrun uses simple name-based paths instead of GUIDs. This keeps the code clean and readable, which is perfect for data engineering workspaces where naming conventions are already well-established. Just use underscores or hyphens instead: `my_workspace` or `my-lakehouse`.
+## What It Does
+It does orchestration, arbitrary SQL statements, and file manipulation. That's it - just stuff I encounter in my daily workflow when working with Fabric notebooks.
 ## Installation
 ```bash
@@ -58,6 +62,10 @@ con.sql("SELECT * FROM my_table LIMIT 10").show()
 # Write to Delta tables (Spark-style API)
 con.sql("SELECT * FROM source").write.mode("overwrite").saveAsTable("target")
+# Upload/download files to/from OneLake Files
+con.copy("./local_folder", "target_folder")  # Upload files
+con.download("target_folder", "./downloaded")  # Download files
 ```
 That's it! No `sql_folder` needed for data exploration.
@@ -97,7 +105,7 @@ con.sql("SELECT * FROM dbo_customers").show()
 con.sql("SELECT * FROM bronze_raw_data").show()
 ```
-## Two Ways to Use Duckrun
+## Three Ways to Use Duckrun
 ### 1. Data Exploration (Spark-Style API)
@@ -127,7 +135,38 @@ con.sql("SELECT * FROM new_orders").write.mode("append").saveAsTable("orders")
 **Note:** `.format("delta")` is optional - Delta is the default format!
-### 2. Pipeline Orchestration
+### 2. File Management (OneLake Files)
+Upload and download files to/from OneLake Files section (not Delta tables):
+```python
+con = duckrun.connect("workspace/lakehouse.lakehouse/dbo")
+# Upload files to OneLake Files (remote_folder is required)
+con.copy("./local_data", "uploaded_data")
+# Upload only specific file types
+con.copy("./reports", "daily_reports", ['.csv', '.parquet'])
+# Upload with overwrite enabled (default is False for safety)
+con.copy("./backup", "backups", overwrite=True)
+# Download files from OneLake Files
+con.download("uploaded_data", "./downloaded")
+# Download only CSV files from a specific folder
+con.download("daily_reports", "./reports", ['.csv'])
+```
+**Key Features:**
+- ✅ **Files go to OneLake Files section** (not Delta Tables)
+- ✅ **`remote_folder` parameter is required** for uploads (prevents accidental uploads)
+- ✅ **`overwrite=False` by default** (safer - prevents accidental overwrites)
+- ✅ **File extension filtering** (e.g., only `.csv` or `.parquet` files)
+- ✅ **Preserves folder structure** during upload/download
+- ✅ **Progress reporting** with file sizes and upload status
+### 3. Pipeline Orchestration
 For production workflows with reusable SQL and Python tasks:
@@ -286,6 +325,63 @@ con = duckrun.connect(
 )
 ```
+## File Management API Reference
+### `copy(local_folder, remote_folder, file_extensions=None, overwrite=False)`
+Upload files from a local folder to OneLake Files section.
+**Parameters:**
+- `local_folder` (str): Path to local folder containing files to upload
+- `remote_folder` (str): **Required** target folder path in OneLake Files
+- `file_extensions` (list, optional): Filter by file extensions (e.g., `['.csv', '.parquet']`)
+- `overwrite` (bool, optional): Whether to overwrite existing files (default: False)
+**Returns:** `True` if all files uploaded successfully, `False` otherwise
+**Examples:**
+```python
+# Upload all files to a target folder
+con.copy("./data", "processed_data")
+# Upload only CSV and Parquet files
+con.copy("./reports", "monthly_reports", ['.csv', '.parquet'])
+# Upload with overwrite enabled
+con.copy("./backup", "daily_backup", overwrite=True)
+```
+### `download(remote_folder="", local_folder="./downloaded_files", file_extensions=None, overwrite=False)`
+Download files from OneLake Files section to a local folder.
+**Parameters:**
+- `remote_folder` (str, optional): Source folder path in OneLake Files (default: root)
+- `local_folder` (str, optional): Local destination folder (default: "./downloaded_files")
+- `file_extensions` (list, optional): Filter by file extensions (e.g., `['.csv', '.json']`)
+- `overwrite` (bool, optional): Whether to overwrite existing local files (default: False)
+**Returns:** `True` if all files downloaded successfully, `False` otherwise
+**Examples:**
+```python
+# Download all files from OneLake Files root
+con.download()
+# Download from specific folder
+con.download("processed_data", "./local_data")
+# Download only JSON files
+con.download("config", "./configs", ['.json'])
+```
+**Important Notes:**
+- Files are uploaded/downloaded to/from the **OneLake Files section**, not Delta Tables
+- The `remote_folder` parameter is **required** for uploads to prevent accidental uploads
+- Both methods default to `overwrite=False` for safety
+- Folder structure is preserved during upload/download operations
+- Progress is reported with file names, sizes, and upload/download status
 ## Complete Example
 ```python
@@ -294,7 +390,10 @@ import duckrun
 # Connect (specify schema for best performance)
 con = duckrun.connect("Analytics/Sales.lakehouse/dbo", sql_folder="./sql")
-# Pipeline with mixed tasks
+# 1. Upload raw data files to OneLake Files
+con.copy("./raw_data", "raw_uploads", ['.csv', '.json'])
+# 2. Pipeline with mixed tasks
 pipeline = [
     # Download raw data (Python)
     ('fetch_api_data', ('https://api.example.com/sales', 'raw')),
@@ -309,20 +408,30 @@ pipeline = [
     ('sales_history', 'append')
 ]
-# Run
+# Run pipeline
 success = con.run(pipeline)
-# Explore results
+# 3. Explore results using DuckDB
 con.sql("SELECT * FROM regional_summary").show()
-# Export to new table
+# 4. Export to new Delta table
 con.sql("""
     SELECT region, SUM(total) as grand_total
     FROM regional_summary
     GROUP BY region
 """).write.mode("overwrite").saveAsTable("region_totals")
+# 5. Download processed files for external systems
+con.download("processed_reports", "./exports", ['.csv'])
 ```
+**This example demonstrates:**
+- 📁 **File uploads** to OneLake Files section
+- 🔄 **Pipeline orchestration** with SQL and Python tasks
+- ⚡ **Fast data exploration** with DuckDB
+- 💾 **Delta table creation** with Spark-style API
+- 📤 **File downloads** from OneLake Files
 ## How It Works
 1. **Connection**: Duckrun connects to your Fabric lakehouse using OneLake and Azure authentication

{duckrun-0.1.6.2 → duckrun-0.1.7}/README.md RENAMED Viewed

@@ -1,6 +1,6 @@
 <img src="https://raw.githubusercontent.com/djouallah/duckrun/main/duckrun.png" width="400" alt="Duckrun">
-Simple task runner for Microsoft Fabric Python notebooks, powered by DuckDB and Delta Lake.
+A helper package for stuff that made my life easier when working with Fabric Python notebooks. Just the things that actually made sense to me - nothing fancy
 ## Important Notes
@@ -10,6 +10,10 @@ Simple task runner for Microsoft Fabric Python notebooks, powered by DuckDB and
 **Why no spaces?** Duckrun uses simple name-based paths instead of GUIDs. This keeps the code clean and readable, which is perfect for data engineering workspaces where naming conventions are already well-established. Just use underscores or hyphens instead: `my_workspace` or `my-lakehouse`.
+## What It Does
+It does orchestration, arbitrary SQL statements, and file manipulation. That's it - just stuff I encounter in my daily workflow when working with Fabric notebooks.
 ## Installation
 ```bash
@@ -38,6 +42,10 @@ con.sql("SELECT * FROM my_table LIMIT 10").show()
 # Write to Delta tables (Spark-style API)
 con.sql("SELECT * FROM source").write.mode("overwrite").saveAsTable("target")
+# Upload/download files to/from OneLake Files
+con.copy("./local_folder", "target_folder")  # Upload files
+con.download("target_folder", "./downloaded")  # Download files
 ```
 That's it! No `sql_folder` needed for data exploration.
@@ -77,7 +85,7 @@ con.sql("SELECT * FROM dbo_customers").show()
 con.sql("SELECT * FROM bronze_raw_data").show()
 ```
-## Two Ways to Use Duckrun
+## Three Ways to Use Duckrun
 ### 1. Data Exploration (Spark-Style API)
@@ -107,7 +115,38 @@ con.sql("SELECT * FROM new_orders").write.mode("append").saveAsTable("orders")
 **Note:** `.format("delta")` is optional - Delta is the default format!
-### 2. Pipeline Orchestration
+### 2. File Management (OneLake Files)
+Upload and download files to/from OneLake Files section (not Delta tables):
+```python
+con = duckrun.connect("workspace/lakehouse.lakehouse/dbo")
+# Upload files to OneLake Files (remote_folder is required)
+con.copy("./local_data", "uploaded_data")
+# Upload only specific file types
+con.copy("./reports", "daily_reports", ['.csv', '.parquet'])
+# Upload with overwrite enabled (default is False for safety)
+con.copy("./backup", "backups", overwrite=True)
+# Download files from OneLake Files
+con.download("uploaded_data", "./downloaded")
+# Download only CSV files from a specific folder
+con.download("daily_reports", "./reports", ['.csv'])
+```
+**Key Features:**
+- ✅ **Files go to OneLake Files section** (not Delta Tables)
+- ✅ **`remote_folder` parameter is required** for uploads (prevents accidental uploads)
+- ✅ **`overwrite=False` by default** (safer - prevents accidental overwrites)
+- ✅ **File extension filtering** (e.g., only `.csv` or `.parquet` files)
+- ✅ **Preserves folder structure** during upload/download
+- ✅ **Progress reporting** with file sizes and upload status
+### 3. Pipeline Orchestration
 For production workflows with reusable SQL and Python tasks:
@@ -266,6 +305,63 @@ con = duckrun.connect(
 )
 ```
+## File Management API Reference
+### `copy(local_folder, remote_folder, file_extensions=None, overwrite=False)`
+Upload files from a local folder to OneLake Files section.
+**Parameters:**
+- `local_folder` (str): Path to local folder containing files to upload
+- `remote_folder` (str): **Required** target folder path in OneLake Files
+- `file_extensions` (list, optional): Filter by file extensions (e.g., `['.csv', '.parquet']`)
+- `overwrite` (bool, optional): Whether to overwrite existing files (default: False)
+**Returns:** `True` if all files uploaded successfully, `False` otherwise
+**Examples:**
+```python
+# Upload all files to a target folder
+con.copy("./data", "processed_data")
+# Upload only CSV and Parquet files
+con.copy("./reports", "monthly_reports", ['.csv', '.parquet'])
+# Upload with overwrite enabled
+con.copy("./backup", "daily_backup", overwrite=True)
+```
+### `download(remote_folder="", local_folder="./downloaded_files", file_extensions=None, overwrite=False)`
+Download files from OneLake Files section to a local folder.
+**Parameters:**
+- `remote_folder` (str, optional): Source folder path in OneLake Files (default: root)
+- `local_folder` (str, optional): Local destination folder (default: "./downloaded_files")
+- `file_extensions` (list, optional): Filter by file extensions (e.g., `['.csv', '.json']`)
+- `overwrite` (bool, optional): Whether to overwrite existing local files (default: False)
+**Returns:** `True` if all files downloaded successfully, `False` otherwise
+**Examples:**
+```python
+# Download all files from OneLake Files root
+con.download()
+# Download from specific folder
+con.download("processed_data", "./local_data")
+# Download only JSON files
+con.download("config", "./configs", ['.json'])
+```
+**Important Notes:**
+- Files are uploaded/downloaded to/from the **OneLake Files section**, not Delta Tables
+- The `remote_folder` parameter is **required** for uploads to prevent accidental uploads
+- Both methods default to `overwrite=False` for safety
+- Folder structure is preserved during upload/download operations
+- Progress is reported with file names, sizes, and upload/download status
 ## Complete Example
 ```python
@@ -274,7 +370,10 @@ import duckrun
 # Connect (specify schema for best performance)
 con = duckrun.connect("Analytics/Sales.lakehouse/dbo", sql_folder="./sql")
-# Pipeline with mixed tasks
+# 1. Upload raw data files to OneLake Files
+con.copy("./raw_data", "raw_uploads", ['.csv', '.json'])
+# 2. Pipeline with mixed tasks
 pipeline = [
     # Download raw data (Python)
     ('fetch_api_data', ('https://api.example.com/sales', 'raw')),
@@ -289,20 +388,30 @@ pipeline = [
     ('sales_history', 'append')
 ]
-# Run
+# Run pipeline
 success = con.run(pipeline)
-# Explore results
+# 3. Explore results using DuckDB
 con.sql("SELECT * FROM regional_summary").show()
-# Export to new table
+# 4. Export to new Delta table
 con.sql("""
     SELECT region, SUM(total) as grand_total
     FROM regional_summary
     GROUP BY region
 """).write.mode("overwrite").saveAsTable("region_totals")
+# 5. Download processed files for external systems
+con.download("processed_reports", "./exports", ['.csv'])
 ```
+**This example demonstrates:**
+- 📁 **File uploads** to OneLake Files section
+- 🔄 **Pipeline orchestration** with SQL and Python tasks
+- ⚡ **Fast data exploration** with DuckDB
+- 💾 **Delta table creation** with Spark-style API
+- 📤 **File downloads** from OneLake Files
 ## How It Works
 1. **Connection**: Duckrun connects to your Fabric lakehouse using OneLake and Azure authentication

{duckrun-0.1.6.2 → duckrun-0.1.7}/duckrun/core.py RENAMED Viewed

@@ -127,77 +127,57 @@ class Duckrun:
         self._attach_lakehouse()
     @classmethod
-    def connect(cls, workspace: Union[str, None] = None, lakehouse_name: Optional[str] = None,
-                schema: str = "dbo", sql_folder: Optional[str] = None,
+    def connect(cls, connection_string: str, sql_folder: Optional[str] = None,
                 compaction_threshold: int = 100):
         """
         Create and connect to lakehouse.
-        Supports two formats:
-        1. Compact: connect("ws/lh.lakehouse/schema", sql_folder=...) or connect("ws/lh.lakehouse")
-        2. Traditional: connect("ws", "lh", "schema", sql_folder) or connect("ws", "lh")
+        Uses compact format: connect("ws/lh.lakehouse/schema") or connect("ws/lh.lakehouse")
         Args:
-            workspace: Workspace name or full path "ws/lh.lakehouse/schema"
-            lakehouse_name: Lakehouse name (optional if using compact format)
-            schema: Schema name (defaults to "dbo")
+            connection_string: OneLake path "ws/lh.lakehouse/schema" or "ws/lh.lakehouse"
             sql_folder: Optional path or URL to SQL files folder
             compaction_threshold: File count threshold for compaction
         Examples:
-            # Compact format (second param treated as sql_folder if it's a URL/path string)
-            dr = Duckrun.connect("temp/power.lakehouse/wa", "https://github.com/.../sql/")
-            dr = Duckrun.connect("ws/lh.lakehouse/schema", "./sql")
+            dr = Duckrun.connect("ws/lh.lakehouse/schema", sql_folder="./sql")
             dr = Duckrun.connect("ws/lh.lakehouse/schema")  # no SQL folder
-            # Traditional format
-            dr = Duckrun.connect("ws", "lh", "schema", "./sql")
-            dr = Duckrun.connect("ws", "lh", "schema")
+            dr = Duckrun.connect("ws/lh.lakehouse")  # defaults to dbo schema
         """
         print("Connecting to Lakehouse...")
         scan_all_schemas = False
-        # Check if using compact format: "ws/lh.lakehouse/schema" or "ws/lh.lakehouse"
-        # If second param looks like a path/URL and not a lakehouse name, treat it as sql_folder
-        if workspace and "/" in workspace and (lakehouse_name is None or
-            (isinstance(lakehouse_name, str) and ('/' in lakehouse_name or lakehouse_name.startswith('http') or lakehouse_name.startswith('.')))):
-            # If lakehouse_name looks like a sql_folder, shift it
-            if lakehouse_name and ('/' in lakehouse_name or lakehouse_name.startswith('http') or lakehouse_name.startswith('.')):
-                sql_folder = lakehouse_name
-                lakehouse_name = None
-            parts = workspace.split("/")
-            if len(parts) == 2:
-                workspace, lakehouse_name = parts
-                scan_all_schemas = True
-                print(f"ℹ️  No schema specified. Using default schema 'dbo' for operations.")
-                print(f"   Scanning all schemas for table discovery...\n")
-            elif len(parts) == 3:
-                workspace, lakehouse_name, schema = parts
-            else:
-                raise ValueError(
-                    f"Invalid connection string format: '{workspace}'. "
-                    "Expected format: 'workspace/lakehouse.lakehouse' or 'workspace/lakehouse.lakehouse/schema'"
-                )
-            if lakehouse_name.endswith(".lakehouse"):
-                lakehouse_name = lakehouse_name[:-10]
-        elif lakehouse_name is not None:
-            # Traditional format - check if schema was explicitly provided
-            if schema == "dbo":
-                scan_all_schemas = True
-                print(f"ℹ️  No schema specified. Using default schema 'dbo' for operations.")
-                print(f"   Scanning all schemas for table discovery...\n")
+        # Only support compact format: "ws/lh.lakehouse/schema" or "ws/lh.lakehouse"
+        if not connection_string or "/" not in connection_string:
+            raise ValueError(
+                "Invalid connection string format. "
+                "Expected format: 'workspace/lakehouse.lakehouse/schema' or 'workspace/lakehouse.lakehouse'"
+            )
+        parts = connection_string.split("/")
+        if len(parts) == 2:
+            workspace, lakehouse_name = parts
+            scan_all_schemas = True
+            schema = "dbo"
+            print(f"ℹ️  No schema specified. Using default schema 'dbo' for operations.")
+            print(f"   Scanning all schemas for table discovery...\n")
+        elif len(parts) == 3:
+            workspace, lakehouse_name, schema = parts
+        else:
+            raise ValueError(
+                f"Invalid connection string format: '{connection_string}'. "
+                "Expected format: 'workspace/lakehouse.lakehouse' or 'workspace/lakehouse.lakehouse/schema'"
+            )
+        if lakehouse_name.endswith(".lakehouse"):
+            lakehouse_name = lakehouse_name[:-10]
         if not workspace or not lakehouse_name:
             raise ValueError(
-                "Missing required parameters. Use either:\n"
+                "Missing required parameters. Use compact format:\n"
                 "  connect('workspace/lakehouse.lakehouse/schema', 'sql_folder')\n"
-                "  connect('workspace/lakehouse.lakehouse')  # defaults to dbo\n"
-                "  connect('workspace', 'lakehouse', 'schema', 'sql_folder')\n"
-                "  connect('workspace', 'lakehouse')  # defaults to dbo"
+                "  connect('workspace/lakehouse.lakehouse')  # defaults to dbo"
             )
         return cls(workspace, lakehouse_name, schema, sql_folder, compaction_threshold, scan_all_schemas)
@@ -210,7 +190,7 @@ class Duckrun:
         if token != "PLACEHOLDER_TOKEN_TOKEN_NOT_AVAILABLE":
             self.con.sql(f"CREATE OR REPLACE SECRET onelake (TYPE AZURE, PROVIDER ACCESS_TOKEN, ACCESS_TOKEN '{token}')")
         else:
-            print("Please login to Azure CLI")
+            print("Authenticating with Azure (trying CLI, will fallback to browser if needed)...")
             from azure.identity import AzureCliCredential, InteractiveBrowserCredential, ChainedTokenCredential
             credential = ChainedTokenCredential(AzureCliCredential(), InteractiveBrowserCredential())
             token = credential.get_token("https://storage.azure.com/.default")
@@ -227,7 +207,7 @@ class Duckrun:
         """
         token = self._get_storage_token()
         if token == "PLACEHOLDER_TOKEN_TOKEN_NOT_AVAILABLE":
-            print("Getting Azure token for table discovery...")
+            print("Authenticating with Azure for table discovery (trying CLI, will fallback to browser if needed)...")
             from azure.identity import AzureCliCredential, InteractiveBrowserCredential, ChainedTokenCredential
             credential = ChainedTokenCredential(AzureCliCredential(), InteractiveBrowserCredential())
             token_obj = credential.get_token("https://storage.azure.com/.default")
@@ -506,6 +486,246 @@ class Duckrun:
         print('='*60)
         return True
+    def copy(self, local_folder: str, remote_folder: str,
+             file_extensions: Optional[List[str]] = None,
+             overwrite: bool = False) -> bool:
+        """
+        Copy files from a local folder to OneLake Files section.
+        Args:
+            local_folder: Path to local folder containing files to upload
+            remote_folder: Target subfolder path in OneLake Files (e.g., "reports/daily") - REQUIRED
+            file_extensions: Optional list of file extensions to filter (e.g., ['.csv', '.parquet'])
+            overwrite: Whether to overwrite existing files (default: False)
+        Returns:
+            True if all files uploaded successfully, False otherwise
+        Examples:
+            # Upload all files from local folder to a target folder
+            dr.copy("./local_data", "uploaded_data")
+            # Upload only CSV files to a specific subfolder
+            dr.copy("./reports", "daily_reports", ['.csv'])
+            # Upload with overwrite enabled
+            dr.copy("./backup", "backups", overwrite=True)
+        """
+        if not os.path.exists(local_folder):
+            print(f"❌ Local folder not found: {local_folder}")
+            return False
+        if not os.path.isdir(local_folder):
+            print(f"❌ Path is not a directory: {local_folder}")
+            return False
+        # Get Azure token
+        token = self._get_storage_token()
+        if token == "PLACEHOLDER_TOKEN_TOKEN_NOT_AVAILABLE":
+            print("Authenticating with Azure for file upload (trying CLI, will fallback to browser if needed)...")
+            from azure.identity import AzureCliCredential, InteractiveBrowserCredential, ChainedTokenCredential
+            credential = ChainedTokenCredential(AzureCliCredential(), InteractiveBrowserCredential())
+            token_obj = credential.get_token("https://storage.azure.com/.default")
+            token = token_obj.token
+            os.environ["AZURE_STORAGE_TOKEN"] = token
+        # Setup OneLake Files URL (not Tables)
+        files_base_url = f'abfss://{self.workspace}@onelake.dfs.fabric.microsoft.com/{self.lakehouse_name}.Lakehouse/Files/'
+        store = AzureStore.from_url(files_base_url, bearer_token=token)
+        # Collect files to upload
+        files_to_upload = []
+        for root, dirs, files in os.walk(local_folder):
+            for file in files:
+                local_file_path = os.path.join(root, file)
+                # Filter by extensions if specified
+                if file_extensions:
+                    _, ext = os.path.splitext(file)
+                    if ext.lower() not in [e.lower() for e in file_extensions]:
+                        continue
+                # Calculate relative path from local_folder
+                rel_path = os.path.relpath(local_file_path, local_folder)
+                # Build remote path in OneLake Files (remote_folder is now mandatory)
+                remote_path = f"{remote_folder.strip('/')}/{rel_path}".replace("\\", "/")
+                files_to_upload.append((local_file_path, remote_path))
+        if not files_to_upload:
+            print(f"No files found to upload in {local_folder}")
+            if file_extensions:
+                print(f"  (filtered by extensions: {file_extensions})")
+            return True
+        print(f"📁 Uploading {len(files_to_upload)} files from '{local_folder}' to OneLake Files...")
+        print(f"   Target folder: {remote_folder}")
+        uploaded_count = 0
+        failed_count = 0
+        for local_path, remote_path in files_to_upload:
+            try:
+                # Check if file exists (if not overwriting)
+                if not overwrite:
+                    try:
+                        obs.head(store, remote_path)
+                        print(f"  ⏭ Skipped (exists): {remote_path}")
+                        continue
+                    except Exception:
+                        # File doesn't exist, proceed with upload
+                        pass
+                # Read local file
+                with open(local_path, 'rb') as f:
+                    file_data = f.read()
+                # Upload to OneLake Files
+                obs.put(store, remote_path, file_data)
+                file_size = len(file_data)
+                size_mb = file_size / (1024 * 1024) if file_size > 1024*1024 else file_size / 1024
+                size_unit = "MB" if file_size > 1024*1024 else "KB"
+                print(f"  ✓ Uploaded: {local_path} → {remote_path} ({size_mb:.1f} {size_unit})")
+                uploaded_count += 1
+            except Exception as e:
+                print(f"  ❌ Failed: {local_path} → {remote_path} | Error: {str(e)[:100]}")
+                failed_count += 1
+        print(f"\n{'='*60}")
+        if failed_count == 0:
+            print(f"✅ Successfully uploaded all {uploaded_count} files to OneLake Files")
+        else:
+            print(f"⚠ Uploaded {uploaded_count} files, {failed_count} failed")
+        print(f"{'='*60}")
+        return failed_count == 0
+    def download(self, remote_folder: str = "", local_folder: str = "./downloaded_files",
+                 file_extensions: Optional[List[str]] = None,
+                 overwrite: bool = False) -> bool:
+        """
+        Download files from OneLake Files section to a local folder.
+        Args:
+            remote_folder: Optional subfolder path in OneLake Files to download from
+            local_folder: Local folder path to download files to (default: "./downloaded_files")
+            file_extensions: Optional list of file extensions to filter (e.g., ['.csv', '.parquet'])
+            overwrite: Whether to overwrite existing local files (default: False)
+        Returns:
+            True if all files downloaded successfully, False otherwise
+        Examples:
+            # Download all files from OneLake Files root
+            dr.download_from_files()
+            # Download only CSV files from a specific subfolder
+            dr.download_from_files("daily_reports", "./reports", ['.csv'])
+        """
+        # Get Azure token
+        token = self._get_storage_token()
+        if token == "PLACEHOLDER_TOKEN_TOKEN_NOT_AVAILABLE":
+            print("Authenticating with Azure for file download (trying CLI, will fallback to browser if needed)...")
+            from azure.identity import AzureCliCredential, InteractiveBrowserCredential, ChainedTokenCredential
+            credential = ChainedTokenCredential(AzureCliCredential(), InteractiveBrowserCredential())
+            token_obj = credential.get_token("https://storage.azure.com/.default")
+            token = token_obj.token
+            os.environ["AZURE_STORAGE_TOKEN"] = token
+        # Setup OneLake Files URL (not Tables)
+        files_base_url = f'abfss://{self.workspace}@onelake.dfs.fabric.microsoft.com/{self.lakehouse_name}.Lakehouse/Files/'
+        store = AzureStore.from_url(files_base_url, bearer_token=token)
+        # Create local directory
+        os.makedirs(local_folder, exist_ok=True)
+        # List files in OneLake Files
+        print(f"📁 Discovering files in OneLake Files...")
+        if remote_folder:
+            print(f"   Source folder: {remote_folder}")
+            prefix = f"{remote_folder.strip('/')}/"
+        else:
+            prefix = ""
+        try:
+            list_stream = obs.list(store, prefix=prefix)
+            files_to_download = []
+            for batch in list_stream:
+                for obj in batch:
+                    remote_path = obj["path"]
+                    # Filter by extensions if specified
+                    if file_extensions:
+                        _, ext = os.path.splitext(remote_path)
+                        if ext.lower() not in [e.lower() for e in file_extensions]:
+                            continue
+                    # Calculate local path
+                    if remote_folder:
+                        rel_path = os.path.relpath(remote_path, remote_folder.strip('/'))
+                    else:
+                        rel_path = remote_path
+                    local_path = os.path.join(local_folder, rel_path).replace('/', os.sep)
+                    files_to_download.append((remote_path, local_path))
+            if not files_to_download:
+                print(f"No files found to download")
+                if file_extensions:
+                    print(f"  (filtered by extensions: {file_extensions})")
+                return True
+            print(f"📥 Downloading {len(files_to_download)} files to '{local_folder}'...")
+            downloaded_count = 0
+            failed_count = 0
+            for remote_path, local_path in files_to_download:
+                try:
+                    # Check if local file exists (if not overwriting)
+                    if not overwrite and os.path.exists(local_path):
+                        print(f"  ⏭ Skipped (exists): {local_path}")
+                        continue
+                    # Ensure local directory exists
+                    os.makedirs(os.path.dirname(local_path), exist_ok=True)
+                    # Download file
+                    data = obs.get(store, remote_path).bytes()
+                    # Write to local file
+                    with open(local_path, 'wb') as f:
+                        f.write(data)
+                    file_size = len(data)
+                    size_mb = file_size / (1024 * 1024) if file_size > 1024*1024 else file_size / 1024
+                    size_unit = "MB" if file_size > 1024*1024 else "KB"
+                    print(f"  ✓ Downloaded: {remote_path} → {local_path} ({size_mb:.1f} {size_unit})")
+                    downloaded_count += 1
+                except Exception as e:
+                    print(f"  ❌ Failed: {remote_path} → {local_path} | Error: {str(e)[:100]}")
+                    failed_count += 1
+            print(f"\n{'='*60}")
+            if failed_count == 0:
+                print(f"✅ Successfully downloaded all {downloaded_count} files from OneLake Files")
+            else:
+                print(f"⚠ Downloaded {downloaded_count} files, {failed_count} failed")
+            print(f"{'='*60}")
+            return failed_count == 0
+        except Exception as e:
+            print(f"❌ Error listing files from OneLake: {e}")
+            return False
     def sql(self, query: str):
         """
         Execute raw SQL query with Spark-style write API.

{duckrun-0.1.6.2 → duckrun-0.1.7}/duckrun.egg-info/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: duckrun
-Version: 0.1.6.2
+Version: 0.1.7
 Summary: Lakehouse task runner powered by DuckDB for Microsoft Fabric
 Author: mim
 License: MIT
@@ -20,7 +20,7 @@ Dynamic: license-file
 <img src="https://raw.githubusercontent.com/djouallah/duckrun/main/duckrun.png" width="400" alt="Duckrun">
-Simple task runner for Microsoft Fabric Python notebooks, powered by DuckDB and Delta Lake.
+A helper package for stuff that made my life easier when working with Fabric Python notebooks. Just the things that actually made sense to me - nothing fancy
 ## Important Notes
@@ -30,6 +30,10 @@ Simple task runner for Microsoft Fabric Python notebooks, powered by DuckDB and
 **Why no spaces?** Duckrun uses simple name-based paths instead of GUIDs. This keeps the code clean and readable, which is perfect for data engineering workspaces where naming conventions are already well-established. Just use underscores or hyphens instead: `my_workspace` or `my-lakehouse`.
+## What It Does
+It does orchestration, arbitrary SQL statements, and file manipulation. That's it - just stuff I encounter in my daily workflow when working with Fabric notebooks.
 ## Installation
 ```bash
@@ -58,6 +62,10 @@ con.sql("SELECT * FROM my_table LIMIT 10").show()
 # Write to Delta tables (Spark-style API)
 con.sql("SELECT * FROM source").write.mode("overwrite").saveAsTable("target")
+# Upload/download files to/from OneLake Files
+con.copy("./local_folder", "target_folder")  # Upload files
+con.download("target_folder", "./downloaded")  # Download files
 ```
 That's it! No `sql_folder` needed for data exploration.
@@ -97,7 +105,7 @@ con.sql("SELECT * FROM dbo_customers").show()
 con.sql("SELECT * FROM bronze_raw_data").show()
 ```
-## Two Ways to Use Duckrun
+## Three Ways to Use Duckrun
 ### 1. Data Exploration (Spark-Style API)
@@ -127,7 +135,38 @@ con.sql("SELECT * FROM new_orders").write.mode("append").saveAsTable("orders")
 **Note:** `.format("delta")` is optional - Delta is the default format!
-### 2. Pipeline Orchestration
+### 2. File Management (OneLake Files)
+Upload and download files to/from OneLake Files section (not Delta tables):
+```python
+con = duckrun.connect("workspace/lakehouse.lakehouse/dbo")
+# Upload files to OneLake Files (remote_folder is required)
+con.copy("./local_data", "uploaded_data")
+# Upload only specific file types
+con.copy("./reports", "daily_reports", ['.csv', '.parquet'])
+# Upload with overwrite enabled (default is False for safety)
+con.copy("./backup", "backups", overwrite=True)
+# Download files from OneLake Files
+con.download("uploaded_data", "./downloaded")
+# Download only CSV files from a specific folder
+con.download("daily_reports", "./reports", ['.csv'])
+```
+**Key Features:**
+- ✅ **Files go to OneLake Files section** (not Delta Tables)
+- ✅ **`remote_folder` parameter is required** for uploads (prevents accidental uploads)
+- ✅ **`overwrite=False` by default** (safer - prevents accidental overwrites)
+- ✅ **File extension filtering** (e.g., only `.csv` or `.parquet` files)
+- ✅ **Preserves folder structure** during upload/download
+- ✅ **Progress reporting** with file sizes and upload status
+### 3. Pipeline Orchestration
 For production workflows with reusable SQL and Python tasks:
@@ -286,6 +325,63 @@ con = duckrun.connect(
 )
 ```
+## File Management API Reference
+### `copy(local_folder, remote_folder, file_extensions=None, overwrite=False)`
+Upload files from a local folder to OneLake Files section.
+**Parameters:**
+- `local_folder` (str): Path to local folder containing files to upload
+- `remote_folder` (str): **Required** target folder path in OneLake Files
+- `file_extensions` (list, optional): Filter by file extensions (e.g., `['.csv', '.parquet']`)
+- `overwrite` (bool, optional): Whether to overwrite existing files (default: False)
+**Returns:** `True` if all files uploaded successfully, `False` otherwise
+**Examples:**
+```python
+# Upload all files to a target folder
+con.copy("./data", "processed_data")
+# Upload only CSV and Parquet files
+con.copy("./reports", "monthly_reports", ['.csv', '.parquet'])
+# Upload with overwrite enabled
+con.copy("./backup", "daily_backup", overwrite=True)
+```
+### `download(remote_folder="", local_folder="./downloaded_files", file_extensions=None, overwrite=False)`
+Download files from OneLake Files section to a local folder.
+**Parameters:**
+- `remote_folder` (str, optional): Source folder path in OneLake Files (default: root)
+- `local_folder` (str, optional): Local destination folder (default: "./downloaded_files")
+- `file_extensions` (list, optional): Filter by file extensions (e.g., `['.csv', '.json']`)
+- `overwrite` (bool, optional): Whether to overwrite existing local files (default: False)
+**Returns:** `True` if all files downloaded successfully, `False` otherwise
+**Examples:**
+```python
+# Download all files from OneLake Files root
+con.download()
+# Download from specific folder
+con.download("processed_data", "./local_data")
+# Download only JSON files
+con.download("config", "./configs", ['.json'])
+```
+**Important Notes:**
+- Files are uploaded/downloaded to/from the **OneLake Files section**, not Delta Tables
+- The `remote_folder` parameter is **required** for uploads to prevent accidental uploads
+- Both methods default to `overwrite=False` for safety
+- Folder structure is preserved during upload/download operations
+- Progress is reported with file names, sizes, and upload/download status
 ## Complete Example
 ```python
@@ -294,7 +390,10 @@ import duckrun
 # Connect (specify schema for best performance)
 con = duckrun.connect("Analytics/Sales.lakehouse/dbo", sql_folder="./sql")
-# Pipeline with mixed tasks
+# 1. Upload raw data files to OneLake Files
+con.copy("./raw_data", "raw_uploads", ['.csv', '.json'])
+# 2. Pipeline with mixed tasks
 pipeline = [
     # Download raw data (Python)
     ('fetch_api_data', ('https://api.example.com/sales', 'raw')),
@@ -309,20 +408,30 @@ pipeline = [
     ('sales_history', 'append')
 ]
-# Run
+# Run pipeline
 success = con.run(pipeline)
-# Explore results
+# 3. Explore results using DuckDB
 con.sql("SELECT * FROM regional_summary").show()
-# Export to new table
+# 4. Export to new Delta table
 con.sql("""
     SELECT region, SUM(total) as grand_total
     FROM regional_summary
     GROUP BY region
 """).write.mode("overwrite").saveAsTable("region_totals")
+# 5. Download processed files for external systems
+con.download("processed_reports", "./exports", ['.csv'])
 ```
+**This example demonstrates:**
+- 📁 **File uploads** to OneLake Files section
+- 🔄 **Pipeline orchestration** with SQL and Python tasks
+- ⚡ **Fast data exploration** with DuckDB
+- 💾 **Delta table creation** with Spark-style API
+- 📤 **File downloads** from OneLake Files
 ## How It Works
 1. **Connection**: Duckrun connects to your Fabric lakehouse using OneLake and Azure authentication

{duckrun-0.1.6.2 → duckrun-0.1.7}/pyproject.toml RENAMED Viewed

@@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"
 [project]
 name = "duckrun"
-version = "0.1.6.2"
+version = "0.1.7"
 description = "Lakehouse task runner powered by DuckDB for Microsoft Fabric"
 readme = "README.md"
 license = {text = "MIT"}

{duckrun-0.1.6.2 → duckrun-0.1.7}/LICENSE RENAMED Viewed

File without changes

{duckrun-0.1.6.2 → duckrun-0.1.7}/duckrun/__init__.py RENAMED Viewed

File without changes

{duckrun-0.1.6.2 → duckrun-0.1.7}/duckrun.egg-info/SOURCES.txt RENAMED Viewed

File without changes

{duckrun-0.1.6.2 → duckrun-0.1.7}/duckrun.egg-info/dependency_links.txt RENAMED Viewed

File without changes

{duckrun-0.1.6.2 → duckrun-0.1.7}/duckrun.egg-info/requires.txt RENAMED Viewed

File without changes

{duckrun-0.1.6.2 → duckrun-0.1.7}/duckrun.egg-info/top_level.txt RENAMED Viewed

File without changes

{duckrun-0.1.6.2 → duckrun-0.1.7}/setup.cfg RENAMED Viewed

File without changes

duckrun 0.1.6.2__tar.gz → 0.1.7__tar.gz

duckrun 0.1.6.2tar.gz → 0.1.7tar.gz