PyPI - duckrun - Versions diffs - 0.1.1__tar.gz → 0.1.2__tar.gz - Mend

duckrun 0.1.1tar.gz → 0.1.2tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (12) hide show

{duckrun-0.1.1 → duckrun-0.1.2}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: duckrun
-Version: 0.1.1
+Version: 0.1.2
 Summary: Lakehouse task runner powered by DuckDB for Microsoft Fabric
 License-Expression: MIT
 Project-URL: Homepage, https://github.com/djouallah/duckrun
@@ -14,10 +14,16 @@ Requires-Dist: deltalake>=0.18.2
 Requires-Dist: requests>=2.28.0
 Dynamic: license-file
-# 🦆 Duckrun
+<img src="duckrun.png" width="400" alt="Duckrun">
 Simple  task runner for Microsoft Fabric Python notebook, powered by DuckDB and Delta_rs.
+## Known Limitation
+Support only Lakehouse with schema, Workspace and lakehouse names should not contains space
 ## Installation
 ```bash
@@ -50,6 +56,10 @@ pipeline = [
 lakehouse.run(pipeline)
 ```
+## Early Exit
+In a pipeline run, if a task fails, the pipeline will stop without running the subsequent tasks.
 ## How It Works
 Duckrun runs two types of tasks:
@@ -117,8 +127,8 @@ Use `__` to create variants of the same table:
 ```python
 pipeline = [
-    ('sales__initial', 'overwrite', {}),    # writes to 'sales' table
-    ('sales__incremental', 'append', {}),   # appends to 'sales' table
+    ('sales__initial', 'overwrite'),    # writes to 'sales' table
+    ('sales__incremental', 'append'),   # appends to 'sales' table
 ]
 ```
@@ -134,32 +144,7 @@ lakehouse.sql("SELECT * FROM my_table LIMIT 10").show()
 df = lakehouse.sql("SELECT COUNT(*) FROM sales").df()
 ```
-## Real-World Example
-```python
-import duckrun as dr
-lakehouse = dr.connect(
-    workspace="Analytics",
-    lakehouse_name="Sales",
-    schema="dbo",
-    sql_folder="./etl"
-)
-# Daily pipeline
-daily = [
-    ('download_files', (api_url, local_path)),
-    ('staging_orders', 'overwrite', {'run_date': '2024-06-01'}),
-    ('staging_customers', 'overwrite', {'run_date': '2024-06-01'}),
-    ('fact_sales', 'append'),
-    ('dim_customer', 'overwrite')
-]
-lakehouse.run(daily)
-# Check results
-lakehouse.sql("SELECT COUNT(*) FROM fact_sales").show()
-```
 ## Remote SQL Files

{duckrun-0.1.1 → duckrun-0.1.2}/README.md RENAMED Viewed

@@ -1,7 +1,13 @@
-# 🦆 Duckrun
+<img src="duckrun.png" width="400" alt="Duckrun">
 Simple  task runner for Microsoft Fabric Python notebook, powered by DuckDB and Delta_rs.
+## Known Limitation
+Support only Lakehouse with schema, Workspace and lakehouse names should not contains space
 ## Installation
 ```bash
@@ -34,6 +40,10 @@ pipeline = [
 lakehouse.run(pipeline)
 ```
+## Early Exit
+In a pipeline run, if a task fails, the pipeline will stop without running the subsequent tasks.
 ## How It Works
 Duckrun runs two types of tasks:
@@ -101,8 +111,8 @@ Use `__` to create variants of the same table:
 ```python
 pipeline = [
-    ('sales__initial', 'overwrite', {}),    # writes to 'sales' table
-    ('sales__incremental', 'append', {}),   # appends to 'sales' table
+    ('sales__initial', 'overwrite'),    # writes to 'sales' table
+    ('sales__incremental', 'append'),   # appends to 'sales' table
 ]
 ```
@@ -118,32 +128,7 @@ lakehouse.sql("SELECT * FROM my_table LIMIT 10").show()
 df = lakehouse.sql("SELECT COUNT(*) FROM sales").df()
 ```
-## Real-World Example
-```python
-import duckrun as dr
-lakehouse = dr.connect(
-    workspace="Analytics",
-    lakehouse_name="Sales",
-    schema="dbo",
-    sql_folder="./etl"
-)
-# Daily pipeline
-daily = [
-    ('download_files', (api_url, local_path)),
-    ('staging_orders', 'overwrite', {'run_date': '2024-06-01'}),
-    ('staging_customers', 'overwrite', {'run_date': '2024-06-01'}),
-    ('fact_sales', 'append'),
-    ('dim_customer', 'overwrite')
-]
-lakehouse.run(daily)
-# Check results
-lakehouse.sql("SELECT COUNT(*) FROM fact_sales").show()
-```
 ## Remote SQL Files

{duckrun-0.1.1 → duckrun-0.1.2}/duckrun/core.py RENAMED Viewed

@@ -64,9 +64,14 @@ class Duckrun:
     def _attach_lakehouse(self):
         self._create_onelake_secret()
         try:
+            # Exclude Iceberg metadata folders when scanning for Delta tables
             list_tables_query = f"""
                 SELECT DISTINCT(split_part(file, '_delta_log', 1)) as tables
                 FROM glob ("abfss://{self.workspace}@onelake.dfs.fabric.microsoft.com/{self.lakehouse_name}.Lakehouse/Tables/*/*/_delta_log/*.json")
+                WHERE file NOT LIKE '%/metadata/%'
+                  AND file NOT LIKE '%/iceberg/%'
+                  AND split_part(file, '_delta_log', 1) NOT LIKE '%/metadata'
+                  AND split_part(file, '_delta_log', 1) NOT LIKE '%/iceberg'
             """
             list_tables_df = self.con.sql(list_tables_query).df()
             list_tables = list_tables_df['tables'].tolist() if not list_tables_df.empty else []
@@ -82,18 +87,27 @@ class Duckrun:
                 if len(parts) >= 2:
                     potential_schema = parts[-2]
                     table = parts[-1]
+                    # Skip Iceberg-related folders
+                    if table in ('metadata', 'iceberg') or potential_schema in ('metadata', 'iceberg'):
+                        continue
                     if potential_schema == self.schema:
                         try:
                             self.con.sql(f"""
                                 CREATE OR REPLACE VIEW {table}
                                 AS SELECT * FROM delta_scan('{self.table_base_url}{self.schema}/{table}');
                             """)
+                            print(f"  ✓ Attached: {table}")
                         except Exception as e:
-                            print(f"Error creating view for table {table}: {e}")
+                            print(f"  ⚠ Skipped {table}: {str(e)[:100]}")
+                            continue
             print("\nAttached tables (views) in DuckDB:")
             self.con.sql("SELECT name FROM (SHOW ALL TABLES) WHERE database='memory'").show()
         except Exception as e:
             print(f"Error attaching lakehouse: {e}")
+            print("Continuing without pre-attached tables.")
     def _normalize_table_name(self, name: str) -> str:
         """Extract base table name before first '__'"""

{duckrun-0.1.1 → duckrun-0.1.2}/duckrun.egg-info/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: duckrun
-Version: 0.1.1
+Version: 0.1.2
 Summary: Lakehouse task runner powered by DuckDB for Microsoft Fabric
 License-Expression: MIT
 Project-URL: Homepage, https://github.com/djouallah/duckrun
@@ -14,10 +14,16 @@ Requires-Dist: deltalake>=0.18.2
 Requires-Dist: requests>=2.28.0
 Dynamic: license-file
-# 🦆 Duckrun
+<img src="duckrun.png" width="400" alt="Duckrun">
 Simple  task runner for Microsoft Fabric Python notebook, powered by DuckDB and Delta_rs.
+## Known Limitation
+Support only Lakehouse with schema, Workspace and lakehouse names should not contains space
 ## Installation
 ```bash
@@ -50,6 +56,10 @@ pipeline = [
 lakehouse.run(pipeline)
 ```
+## Early Exit
+In a pipeline run, if a task fails, the pipeline will stop without running the subsequent tasks.
 ## How It Works
 Duckrun runs two types of tasks:
@@ -117,8 +127,8 @@ Use `__` to create variants of the same table:
 ```python
 pipeline = [
-    ('sales__initial', 'overwrite', {}),    # writes to 'sales' table
-    ('sales__incremental', 'append', {}),   # appends to 'sales' table
+    ('sales__initial', 'overwrite'),    # writes to 'sales' table
+    ('sales__incremental', 'append'),   # appends to 'sales' table
 ]
 ```
@@ -134,32 +144,7 @@ lakehouse.sql("SELECT * FROM my_table LIMIT 10").show()
 df = lakehouse.sql("SELECT COUNT(*) FROM sales").df()
 ```
-## Real-World Example
-```python
-import duckrun as dr
-lakehouse = dr.connect(
-    workspace="Analytics",
-    lakehouse_name="Sales",
-    schema="dbo",
-    sql_folder="./etl"
-)
-# Daily pipeline
-daily = [
-    ('download_files', (api_url, local_path)),
-    ('staging_orders', 'overwrite', {'run_date': '2024-06-01'}),
-    ('staging_customers', 'overwrite', {'run_date': '2024-06-01'}),
-    ('fact_sales', 'append'),
-    ('dim_customer', 'overwrite')
-]
-lakehouse.run(daily)
-# Check results
-lakehouse.sql("SELECT COUNT(*) FROM fact_sales").show()
-```
 ## Remote SQL Files

{duckrun-0.1.1 → duckrun-0.1.2}/pyproject.toml RENAMED Viewed

@@ -5,7 +5,7 @@ build-backend = "setuptools.build_meta"
 [project]
 name = "duckrun"
-version = "0.1.1"
+version = "0.1.2"
 description = "Lakehouse task runner powered by DuckDB for Microsoft Fabric"
 readme = "README.md"
 license = "MIT"

{duckrun-0.1.1 → duckrun-0.1.2}/LICENSE RENAMED Viewed

File without changes

{duckrun-0.1.1 → duckrun-0.1.2}/duckrun/__init__.py RENAMED Viewed

File without changes

{duckrun-0.1.1 → duckrun-0.1.2}/duckrun.egg-info/SOURCES.txt RENAMED Viewed

File without changes

{duckrun-0.1.1 → duckrun-0.1.2}/duckrun.egg-info/dependency_links.txt RENAMED Viewed

File without changes

{duckrun-0.1.1 → duckrun-0.1.2}/duckrun.egg-info/requires.txt RENAMED Viewed

File without changes

{duckrun-0.1.1 → duckrun-0.1.2}/duckrun.egg-info/top_level.txt RENAMED Viewed

File without changes

{duckrun-0.1.1 → duckrun-0.1.2}/setup.cfg RENAMED Viewed

File without changes

duckrun 0.1.1__tar.gz → 0.1.2__tar.gz

duckrun 0.1.1tar.gz → 0.1.2tar.gz